llama.cpp/ggml
Yu, Zijun 65e1b1af6d Fix after rebasing
- Layout of cache k and cache v are unified: [seq, n_head, head_size]
- Add CPY and FLASH_ATTN_EXT, flash attn is not used yet
- Skip test-backend-ops due to flash attn test crash
- Add mutex around graph conversion to avoid test-thread-safety fali in the future
- Update NPU config
- Update GPU config to disable SDPA opt to make phi-3 run
2026-01-15 11:19:15 -08:00
..
cmake ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (#15094) 2025-08-07 13:45:41 +02:00
include fix build error 2026-01-15 10:10:00 -08:00
src Fix after rebasing 2026-01-15 11:19:15 -08:00
.gitignore
CMakeLists.txt Refactor: clean, fix warning 2026-01-15 10:20:18 -08:00