llama.cpp

History

Daniel Bevenius 62cef26ac5 model-conversion : add qat-q4 quantization targets (#15588 ) This commit adds two targets to the Makefile for quantizing of Quantization Aware Trained (QAT) models to Q4_0 format. The motivation for this is that this sets the token embedding and the output tensors data types to Q8_0 instead of the default Q6_K. This is someting that we wish to enforce for QAT Q4_0 models that are to be uploaded to ggml-org on Huggingface to guarantee the best quality.		2025-08-26 16:12:29 +02:00
..
batched	common : refactor downloading system, handle mmproj with -hf option (#12694 )	2025-04-01 23:44:05 +02:00
batched.swift	examples : remove references to `make` in examples [no ci] (#15457 )	2025-08-21 06:12:28 +02:00
convert-llama2c-to-ggml	llama : add `llama_vocab`, functions -> methods, naming (#11110 )	2025-01-12 11:32:42 +02:00
deprecation-warning	Update deprecation-warning.cpp (#10619 )	2024-12-04 23:19:20 +01:00
diffusion	Add LLaDA 8b Diffusion model (#14771 )	2025-07-31 19:49:09 +08:00
embedding	tests : update for LLAMA_SET_ROWS=1 (#14961 )	2025-07-30 15:12:02 +03:00
eval-callback	eval-callback : stop on first NaN (#15320 )	2025-08-14 22:10:51 +03:00
gen-docs	ggml : move AMX to the CPU backend (#10570 )	2024-11-29 21:54:58 +01:00
gguf	GGUF: C++ refactor, backend support, misc fixes (#11030 )	2025-01-07 18:01:58 +01:00
gguf-hash	GGUF: C++ refactor, backend support, misc fixes (#11030 )	2025-01-07 18:01:58 +01:00
gritlm	llama : rework embeddings logic (#14208 )	2025-06-16 14:14:00 +03:00
jeopardy	scripts : make the shell scripts cross-platform (#14341 )	2025-06-30 10:17:18 +02:00
llama.android	llama : deprecate llama_kv_self_ API (#14030 )	2025-06-06 14:11:15 +03:00
llama.swiftui	llama : deprecate llama_kv_self_ API (#14030 )	2025-06-06 14:11:15 +03:00
lookahead	lookahead : add sample command to readme (#15447 )	2025-08-20 13:30:46 +03:00
lookup	llama : deprecate llama_kv_self_ API (#14030 )	2025-06-06 14:11:15 +03:00
model-conversion	model-conversion : add qat-q4 quantization targets (#15588 )	2025-08-26 16:12:29 +02:00
parallel	parallel : add option for different RNG seeds (#14757 )	2025-07-18 17:33:41 +03:00
passkey	examples : remove references to `make` in examples [no ci] (#15457 )	2025-08-21 06:12:28 +02:00
retrieval	examples : remove references to `make` in examples [no ci] (#15457 )	2025-08-21 06:12:28 +02:00
save-load-state	tests : update for LLAMA_SET_ROWS=1 (#14961 )	2025-07-30 15:12:02 +03:00
simple	fix: check model pointer validity before use (#13631 )	2025-05-19 13:25:41 +03:00
simple-chat	simple-chat : fix context-exceeded condition (#14494 )	2025-07-02 14:12:07 +03:00
simple-cmake-pkg	repo : update links to new url (#11886 )	2025-02-15 16:40:57 +02:00
speculative	common : add --override-tensor-draft, --cpu-moe-draft and --n-cpu-moe-draft parameters (#15191 )	2025-08-13 12:44:40 +02:00
speculative-simple	common : add --override-tensor-draft, --cpu-moe-draft and --n-cpu-moe-draft parameters (#15191 )	2025-08-13 12:44:40 +02:00
sycl	examples : remove references to `make` in examples [no ci] (#15457 )	2025-08-21 06:12:28 +02:00
training	finetune: SGD optimizer, more CLI args (#13873 )	2025-08-14 12:03:57 +02:00
CMakeLists.txt	examples : add model conversion tool/example (#15455 )	2025-08-21 12:16:54 +02:00
Miku.sh	scripts : make the shell scripts cross-platform (#14341 )	2025-06-30 10:17:18 +02:00
chat-13B.bat	Create chat-13B.bat (#592 )	2023-03-29 20:21:09 +03:00
chat-13B.sh	scripts : make the shell scripts cross-platform (#14341 )	2025-06-30 10:17:18 +02:00
chat-persistent.sh	scripts : make the shell scripts cross-platform (#14341 )	2025-06-30 10:17:18 +02:00
chat-vicuna.sh	scripts : make the shell scripts cross-platform (#14341 )	2025-06-30 10:17:18 +02:00
chat.sh	scripts : make the shell scripts cross-platform (#14341 )	2025-06-30 10:17:18 +02:00
convert_legacy_llama.py	metadata: Detailed Dataset Authorship Metadata (#8875 )	2024-11-13 21:10:38 +11:00
json_schema_pydantic_example.py	py : type-check all Python scripts with Pyright (#8341 )	2024-07-07 15:04:39 -04:00
json_schema_to_grammar.py	grammar : handle maxItems == 0 in JSON schema (#13117 )	2025-04-26 10:10:20 +02:00
llama.vim	llama : remove KV cache defragmentation logic (#15473 )	2025-08-22 12:22:13 +03:00
llm.vim	llm.vim : stop generation at multiple linebreaks, bind to <F2> (#2879 )	2023-08-30 09:50:55 +03:00
pydantic_models_to_grammar.py	pydantic : replace uses of __annotations__ with get_type_hints (#8474 )	2024-07-14 19:51:21 -04:00
pydantic_models_to_grammar_examples.py	llama : move end-user examples to tools directory (#13249 )	2025-05-02 20:27:13 +02:00
reason-act.sh	scripts : make the shell scripts cross-platform (#14341 )	2025-06-30 10:17:18 +02:00
regex_to_grammar.py	py : switch to snake_case (#8305 )	2024-07-05 07:53:33 +03:00
server-llama2-13B.sh	scripts : make the shell scripts cross-platform (#14341 )	2025-06-30 10:17:18 +02:00
server_embd.py	llama : fix FA when KV cache is not used (i.e. embeddings) (#12825 )	2025-04-08 19:54:51 +03:00
ts-type-to-grammar.sh	scripts : make the shell scripts cross-platform (#14341 )	2025-06-30 10:17:18 +02:00