llama.cpp/ggml
HyperFoldUK 39137bfe63 ggml : integrate sparse-ternary-fma for TQ2_0 quantization
- Add adapter layer for TQ2_0 encoding conversion
- Implement branchless bitwise encoding conversion
- Add SIMD-accelerated Q8_K to int32 type conversion
- Integrate with ggml_vec_dot_tq2_0_q8_K_generic via threshold dispatch
- Add TQ2_0 test cases to test-backend-ops
- Include sparse-ternary-fma library (dense SIMD kernel)
- 2.3x throughput improvement on AVX-512
2026-01-14 05:20:27 -05:00
..
cmake ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (#15094) 2025-08-07 13:45:41 +02:00
include ggml-webgpu: Fix GGML_MEM_ALIGN to 8 for emscripten. (#18628) 2026-01-08 08:36:42 -08:00
src ggml : integrate sparse-ternary-fma for TQ2_0 quantization 2026-01-14 05:20:27 -05:00
.gitignore vulkan : cmake integration (#8119) 2024-07-13 18:12:39 +02:00
CMakeLists.txt ggml : integrate sparse-ternary-fma for TQ2_0 quantization 2026-01-14 05:20:27 -05:00