This allows disabling the CUDA implementation of ggml_moe_sum to compare performance with ggml_cuda_op_fused_add. When GGML_DISABLE_MOE_SUM_CUDA is defined: - moesum.cu becomes empty (no CUDA kernel) - ggml_moe_sum falls back to CPU implementation - Setting LLAMA_DISABLE_MOE_SUM=1 will use ggml_add loop which triggers ggml_cuda_op_fused_add Usage for comparison: - ggml_moe_sum (CUDA): default (both flags unset) - ggml_cuda_op_fused_add: -DGGML_DISABLE_MOE_SUM_CUDA=1 -DLLAMA_DISABLE_MOE_SUM=1 |
||
|---|---|---|
| .. | ||
| cmake | ||
| include | ||
| src | ||
| .gitignore | ||
| CMakeLists.txt | ||