llama.cpp

History

shaobo.xie d176ae1c61 build: add GGML_DISABLE_MOE_SUM_CUDA compile flag for moe_sum comparison This allows disabling the CUDA implementation of ggml_moe_sum to compare performance with ggml_cuda_op_fused_add. When GGML_DISABLE_MOE_SUM_CUDA is defined: - moesum.cu becomes empty (no CUDA kernel) - ggml_moe_sum falls back to CPU implementation - Setting LLAMA_DISABLE_MOE_SUM=1 will use ggml_add loop which triggers ggml_cuda_op_fused_add Usage for comparison: - ggml_moe_sum (CUDA): default (both flags unset) - ggml_cuda_op_fused_add: -DGGML_DISABLE_MOE_SUM_CUDA=1 -DLLAMA_DISABLE_MOE_SUM=1		2026-02-06 11:44:53 +08:00
..
cmake	ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (#15094 )	2025-08-07 13:45:41 +02:00
include	ggml: add moe_sum operator for Mixture of Experts aggregation	2026-02-05 15:34:24 +08:00
src	build: add GGML_DISABLE_MOE_SUM_CUDA compile flag for moe_sum comparison	2026-02-06 11:44:53 +08:00
.gitignore	vulkan : cmake integration (#8119 )	2024-07-13 18:12:39 +02:00
CMakeLists.txt	Bump cmake max version (needed for Windows on Snapdragon builds) (#19188 )	2026-02-01 14:13:38 -08:00