llama.cpp

History

Daniel Bevenius 62cef26ac5 model-conversion : add qat-q4 quantization targets (#15588 ) This commit adds two targets to the Makefile for quantizing of Quantization Aware Trained (QAT) models to Q4_0 format. The motivation for this is that this sets the token embedding and the output tensors data types to Q8_0 instead of the default Q6_K. This is someting that we wish to enforce for QAT Q4_0 models that are to be uploaded to ggml-org on Huggingface to guarantee the best quality.		2025-08-26 16:12:29 +02:00
..
causal	model-conversion : add model card template for embeddings [no ci] (#15557 )	2025-08-25 14:25:25 +02:00
embedding	model-conversion : add model card template for embeddings [no ci] (#15557 )	2025-08-25 14:25:25 +02:00
utils	model-conversion : add qat-q4 quantization targets (#15588 )	2025-08-26 16:12:29 +02:00