llama.cpp/ggml
shaobo.xie d176ae1c61 build: add GGML_DISABLE_MOE_SUM_CUDA compile flag for moe_sum comparison
This allows disabling the CUDA implementation of ggml_moe_sum to
compare performance with ggml_cuda_op_fused_add.

When GGML_DISABLE_MOE_SUM_CUDA is defined:
- moesum.cu becomes empty (no CUDA kernel)
- ggml_moe_sum falls back to CPU implementation
- Setting LLAMA_DISABLE_MOE_SUM=1 will use ggml_add loop
  which triggers ggml_cuda_op_fused_add

Usage for comparison:
- ggml_moe_sum (CUDA): default (both flags unset)
- ggml_cuda_op_fused_add: -DGGML_DISABLE_MOE_SUM_CUDA=1 -DLLAMA_DISABLE_MOE_SUM=1
2026-02-06 11:44:53 +08:00
..
cmake ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (#15094) 2025-08-07 13:45:41 +02:00
include ggml: add moe_sum operator for Mixture of Experts aggregation 2026-02-05 15:34:24 +08:00
src build: add GGML_DISABLE_MOE_SUM_CUDA compile flag for moe_sum comparison 2026-02-06 11:44:53 +08:00
.gitignore vulkan : cmake integration (#8119) 2024-07-13 18:12:39 +02:00
CMakeLists.txt Bump cmake max version (needed for Windows on Snapdragon builds) (#19188) 2026-02-01 14:13:38 -08:00