llama.cpp/docs/backend
Alfred ce734a8a2f
ggml-hexagon: Implement true Q8_0 quantization on Hexagon NPU for more accurate mixed-precision matmul operations (#17977)
* feat: implement real Q8_0

* feat: adding cmake option for configuring FP32 quantize group size

* typo: set() shall be used

---------

Co-authored-by: ngdxzy <zhenyu_xu@uri.edu>
2025-12-19 09:42:28 -08:00
..
hexagon ggml-hexagon: Implement true Q8_0 quantization on Hexagon NPU for more accurate mixed-precision matmul operations (#17977) 2025-12-19 09:42:28 -08:00
BLIS.md make : deprecate (#10514) 2024-12-02 21:22:53 +02:00
CANN.md CANN: GGML_CANN_ACL_GRAPH works only USE_ACL_GRAPH enabled (#16861) 2025-11-12 14:37:52 +08:00
CUDA-FEDORA.md docs: update: improve the Fedoa CUDA guide (#12536) 2025-03-24 11:02:26 +00:00
OPENCL.md opencl: update doc (#17011) 2025-11-04 16:02:36 -08:00
SYCL.md added note for old Intel hardware pre sycl (#18017) 2025-12-16 17:45:09 +08:00
ZenDNN.md ggml-zendnn : add ZenDNN backend for AMD CPUs (#17690) 2025-12-07 00:13:33 +08:00
zDNN.md ggml-zendnn : add ZenDNN backend for AMD CPUs (#17690) 2025-12-07 00:13:33 +08:00