Commit Graph

1 Commits

Author SHA1 Message Date
shaobo.xie d176ae1c61 build: add GGML_DISABLE_MOE_SUM_CUDA compile flag for moe_sum comparison
This allows disabling the CUDA implementation of ggml_moe_sum to
compare performance with ggml_cuda_op_fused_add.

When GGML_DISABLE_MOE_SUM_CUDA is defined:
- moesum.cu becomes empty (no CUDA kernel)
- ggml_moe_sum falls back to CPU implementation
- Setting LLAMA_DISABLE_MOE_SUM=1 will use ggml_add loop
  which triggers ggml_cuda_op_fused_add

Usage for comparison:
- ggml_moe_sum (CUDA): default (both flags unset)
- ggml_cuda_op_fused_add: -DGGML_DISABLE_MOE_SUM_CUDA=1 -DLLAMA_DISABLE_MOE_SUM=1
2026-02-06 11:44:53 +08:00