llama.cpp

History

Daniel Bevenius 46d9caa27a model-conversion : add mmproj conversion target (#15628 ) This commit adds a new target to the Makefile for converting models that are multimodal. This target will convert the original model and in addition also create the mmproj GGUF model. The motivation for this change is that for models that are multimodal, for example those that contain a vision encoders, we will often want to upload both the quantized model and the vision encoder model to HuggingFace. Example usage: ```console $ make causal-convert-mm-model MODEL_PATH=~/work/ai/models/gemma-3-4b-it-qat-q4_0-unquantized/ ... The environment variable CONVERTED_MODEL can be set to this path using: export CONVERTED_MODEL=/home/danbev/work/ai/llama.cpp/models/gemma-3-4b-it-qat-q4_0-unquantized.gguf The mmproj model was created in /home/danbev/work/ai/llama.cpp/models/mmproj-gemma-3-4b-it-qat-q4_0-unquantized.gguf ``` The converted original model can then be quantized, and after that both the quantized model and the mmproj file can then be uploaded to HuggingFace. Refs: https://huggingface.co/ggml-org/gemma-3-4b-it-qat-GGUF/tree/main		2025-08-28 09:26:48 +02:00
..
batched	common : refactor downloading system, handle mmproj with -hf option (#12694 )	2025-04-01 23:44:05 +02:00
batched.swift	examples : remove references to `make` in examples [no ci] (#15457 )	2025-08-21 06:12:28 +02:00
convert-llama2c-to-ggml	llama : add `llama_vocab`, functions -> methods, naming (#11110 )	2025-01-12 11:32:42 +02:00
deprecation-warning	Update deprecation-warning.cpp (#10619 )	2024-12-04 23:19:20 +01:00
diffusion	Add LLaDA 8b Diffusion model (#14771 )	2025-07-31 19:49:09 +08:00
embedding	tests : update for LLAMA_SET_ROWS=1 (#14961 )	2025-07-30 15:12:02 +03:00
eval-callback	eval-callback : stop on first NaN (#15320 )	2025-08-14 22:10:51 +03:00
gen-docs	ggml : move AMX to the CPU backend (#10570 )	2024-11-29 21:54:58 +01:00
gguf	GGUF: C++ refactor, backend support, misc fixes (#11030 )	2025-01-07 18:01:58 +01:00
gguf-hash	GGUF: C++ refactor, backend support, misc fixes (#11030 )	2025-01-07 18:01:58 +01:00
gritlm	llama : rework embeddings logic (#14208 )	2025-06-16 14:14:00 +03:00
jeopardy	scripts : make the shell scripts cross-platform (#14341 )	2025-06-30 10:17:18 +02:00
llama.android	llama : deprecate llama_kv_self_ API (#14030 )	2025-06-06 14:11:15 +03:00
llama.swiftui	llama : deprecate llama_kv_self_ API (#14030 )	2025-06-06 14:11:15 +03:00
lookahead	lookahead : add sample command to readme (#15447 )	2025-08-20 13:30:46 +03:00
lookup	llama : deprecate llama_kv_self_ API (#14030 )	2025-06-06 14:11:15 +03:00
model-conversion	model-conversion : add mmproj conversion target (#15628 )	2025-08-28 09:26:48 +02:00
parallel	parallel : add option for different RNG seeds (#14757 )	2025-07-18 17:33:41 +03:00
passkey	examples : remove references to `make` in examples [no ci] (#15457 )	2025-08-21 06:12:28 +02:00
retrieval	examples : remove references to `make` in examples [no ci] (#15457 )	2025-08-21 06:12:28 +02:00
save-load-state	tests : update for LLAMA_SET_ROWS=1 (#14961 )	2025-07-30 15:12:02 +03:00
simple	fix: check model pointer validity before use (#13631 )	2025-05-19 13:25:41 +03:00
simple-chat	simple-chat : fix context-exceeded condition (#14494 )	2025-07-02 14:12:07 +03:00
simple-cmake-pkg	repo : update links to new url (#11886 )	2025-02-15 16:40:57 +02:00
speculative	common : add --override-tensor-draft, --cpu-moe-draft and --n-cpu-moe-draft parameters (#15191 )	2025-08-13 12:44:40 +02:00
speculative-simple	common : add --override-tensor-draft, --cpu-moe-draft and --n-cpu-moe-draft parameters (#15191 )	2025-08-13 12:44:40 +02:00
sycl	examples : remove references to `make` in examples [no ci] (#15457 )	2025-08-21 06:12:28 +02:00
training	finetune: SGD optimizer, more CLI args (#13873 )	2025-08-14 12:03:57 +02:00
CMakeLists.txt	examples : add model conversion tool/example (#15455 )	2025-08-21 12:16:54 +02:00
Miku.sh	scripts : make the shell scripts cross-platform (#14341 )	2025-06-30 10:17:18 +02:00
chat-13B.bat	Create chat-13B.bat (#592 )	2023-03-29 20:21:09 +03:00
chat-13B.sh	scripts : make the shell scripts cross-platform (#14341 )	2025-06-30 10:17:18 +02:00
chat-persistent.sh	scripts : make the shell scripts cross-platform (#14341 )	2025-06-30 10:17:18 +02:00
chat-vicuna.sh	scripts : make the shell scripts cross-platform (#14341 )	2025-06-30 10:17:18 +02:00
chat.sh	scripts : make the shell scripts cross-platform (#14341 )	2025-06-30 10:17:18 +02:00
convert_legacy_llama.py	metadata: Detailed Dataset Authorship Metadata (#8875 )	2024-11-13 21:10:38 +11:00
json_schema_pydantic_example.py	py : type-check all Python scripts with Pyright (#8341 )	2024-07-07 15:04:39 -04:00
json_schema_to_grammar.py	grammar : handle maxItems == 0 in JSON schema (#13117 )	2025-04-26 10:10:20 +02:00
llama.vim	llama : remove KV cache defragmentation logic (#15473 )	2025-08-22 12:22:13 +03:00
llm.vim	llm.vim : stop generation at multiple linebreaks, bind to <F2> (#2879 )	2023-08-30 09:50:55 +03:00
pydantic_models_to_grammar.py	pydantic : replace uses of __annotations__ with get_type_hints (#8474 )	2024-07-14 19:51:21 -04:00
pydantic_models_to_grammar_examples.py	llama : move end-user examples to tools directory (#13249 )	2025-05-02 20:27:13 +02:00
reason-act.sh	scripts : make the shell scripts cross-platform (#14341 )	2025-06-30 10:17:18 +02:00
regex_to_grammar.py	py : switch to snake_case (#8305 )	2024-07-05 07:53:33 +03:00
server-llama2-13B.sh	scripts : make the shell scripts cross-platform (#14341 )	2025-06-30 10:17:18 +02:00
server_embd.py	llama : fix FA when KV cache is not used (i.e. embeddings) (#12825 )	2025-04-08 19:54:51 +03:00
ts-type-to-grammar.sh	scripts : make the shell scripts cross-platform (#14341 )	2025-06-30 10:17:18 +02:00