llama.cpp

Commit Graph

Author	SHA1	Message	Date
shaobo.xie	d176ae1c61	build: add GGML_DISABLE_MOE_SUM_CUDA compile flag for moe_sum comparison This allows disabling the CUDA implementation of ggml_moe_sum to compare performance with ggml_cuda_op_fused_add. When GGML_DISABLE_MOE_SUM_CUDA is defined: - moesum.cu becomes empty (no CUDA kernel) - ggml_moe_sum falls back to CPU implementation - Setting LLAMA_DISABLE_MOE_SUM=1 will use ggml_add loop which triggers ggml_cuda_op_fused_add Usage for comparison: - ggml_moe_sum (CUDA): default (both flags unset) - ggml_cuda_op_fused_add: -DGGML_DISABLE_MOE_SUM_CUDA=1 -DLLAMA_DISABLE_MOE_SUM=1	2026-02-06 11:44:53 +08:00

Author

SHA1

Message

Date

shaobo.xie

d176ae1c61

build: add GGML_DISABLE_MOE_SUM_CUDA compile flag for moe_sum comparison

This allows disabling the CUDA implementation of ggml_moe_sum to
compare performance with ggml_cuda_op_fused_add.

When GGML_DISABLE_MOE_SUM_CUDA is defined:
- moesum.cu becomes empty (no CUDA kernel)
- ggml_moe_sum falls back to CPU implementation
- Setting LLAMA_DISABLE_MOE_SUM=1 will use ggml_add loop
  which triggers ggml_cuda_op_fused_add

Usage for comparison:
- ggml_moe_sum (CUDA): default (both flags unset)
- ggml_cuda_op_fused_add: -DGGML_DISABLE_MOE_SUM_CUDA=1 -DLLAMA_DISABLE_MOE_SUM=1

2026-02-06 11:44:53 +08:00

1 Commits