llama.cpp

History

Daniel Bevenius 444f00b0ec llama : remove quantization sanity check (#17788 ) * llama : remove quantization sanity check This commit removes the quantization sanity check for attention layers. The motivation for this is that there are model that are hybrid models that have recurrent layers, experts layers, and attention layers. For these models the current check fails as the experts layers are not taking into account. After consideration, it was decided that this check is not strictly necessary, and can be removed to allow for more flexible model architectures. * llama : remove unused pruned_attention_w and is_clip_model vars		2025-12-06 12:26:20 +01:00
..
models	model: support Ministral3 (#17644 )	2025-12-01 12:26:52 +01:00
CMakeLists.txt	model: support Ministral3 (#17644 )	2025-12-01 12:26:52 +01:00
llama-adapter.cpp	aLoRA Support (#15327 )	2025-09-05 17:32:39 -06:00
llama-adapter.h	aLoRA Support (#15327 )	2025-09-05 17:32:39 -06:00
llama-arch.cpp	Override SSM_A op for Qwen3 Next to reduce splits (#17587 )	2025-12-02 00:43:13 +01:00
llama-arch.h	Override SSM_A op for Qwen3 Next to reduce splits (#17587 )	2025-12-02 00:43:13 +01:00
llama-batch.cpp	batch : fix consistency checks for the input positions (#16890 )	2025-10-31 13:50:33 +02:00
llama-batch.h	llama: store mrope data in KV cell (#16825 )	2025-10-29 18:09:18 +01:00
llama-chat.cpp	model : add openPangu-Embedded (#16941 )	2025-11-05 10:28:58 +01:00
llama-chat.h	model : add openPangu-Embedded (#16941 )	2025-11-05 10:28:58 +01:00
llama-context.cpp	ggml : add GGML_SCHED_NO_REALLOC option to disable reallocations in ggml_backend_sched (#17276 )	2025-11-28 17:33:23 +02:00
llama-context.h	server : support unified cache across slots (#16736 )	2025-11-02 18:14:04 +02:00
llama-cparams.cpp	cparams : rename LLAMA_MAX_PARALLEL_SEQUENCES to LLAMA_MAX_SEQ (#14188 )	2025-06-15 10:08:58 +03:00
llama-cparams.h	server : support unified cache across slots (#16736 )	2025-11-02 18:14:04 +02:00
llama-grammar.cpp	grammar: fix regression caused by #17381 (#17412 )	2025-11-20 18:35:10 +01:00
llama-grammar.h	`tool-call`: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars (#12034 )	2025-03-05 13:05:13 +00:00
llama-graph.cpp	model: support Ministral3 (#17644 )	2025-12-01 12:26:52 +01:00
llama-graph.h	graph : support cacheless embeddings with FA and iSWA (#16528 )	2025-10-13 22:42:37 +03:00
llama-hparams.cpp	hparams : add n_embd_inp() to support extended embed (#16928 )	2025-11-07 19:27:58 +01:00
llama-hparams.h	model: support Ministral3 (#17644 )	2025-12-01 12:26:52 +01:00
llama-impl.cpp	common : more accurate sampling timing (#17382 )	2025-11-20 13:40:10 +02:00
llama-impl.h	ggml, llama : use defaulted constructors/destructors (#17649 )	2025-12-03 07:12:18 +01:00
llama-io.cpp	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )	2025-03-13 12:35:44 +02:00
llama-io.h	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )	2025-03-13 12:35:44 +02:00
llama-kv-cache-iswa.cpp	kv-cache : pad the cache size to 256 for performance (#17046 )	2025-11-07 20:03:25 +02:00
llama-kv-cache-iswa.h	llama: print memory breakdown on exit (#15860 )	2025-09-24 16:53:48 +02:00
llama-kv-cache.cpp	model: add support for qwen3vl series (#16780 )	2025-10-30 16:19:14 +01:00
llama-kv-cache.h	memory : remove KV cache size padding (#16812 )	2025-10-28 20:19:44 +02:00
llama-kv-cells.h	llama: store mrope data in KV cell (#16825 )	2025-10-29 18:09:18 +01:00
llama-memory-hybrid.cpp	memory : use sequential equal splits for recurrent modules (#16442 )	2025-10-07 08:24:17 +03:00
llama-memory-hybrid.h	llama: print memory breakdown on exit (#15860 )	2025-09-24 16:53:48 +02:00
llama-memory-recurrent.cpp	memory: Hybrid context shift (#17009 )	2025-11-10 17:14:23 +02:00
llama-memory-recurrent.h	llama: consistent ctx <-> buf order for KV cache (#16746 )	2025-10-28 11:23:54 +01:00
llama-memory.cpp	memory : correctly handle failure in apply() (#14438 )	2025-06-30 18:03:03 +03:00
llama-memory.h	llama: print memory breakdown on exit (#15860 )	2025-09-24 16:53:48 +02:00
llama-mmap.cpp	llama : fix signed comparison warning on FreeBSD (#17497 )	2025-12-02 12:05:38 +01:00
llama-mmap.h	llama-mmap: fix missing include (#11796 )	2025-02-10 20:58:18 +02:00
llama-model-loader.cpp	model : Apertus model implementation (#15852 )	2025-10-02 20:43:22 +03:00
llama-model-loader.h	model: support GLM 4.5 family of models (#14939 )	2025-08-04 20:29:25 +02:00
llama-model-saver.cpp	llama : improve sep token handling (#14272 )	2025-06-20 14:04:09 +02:00
llama-model-saver.h	llama/ggml: add LLM training support (#10544 )	2025-05-12 14:44:49 +02:00
llama-model.cpp	ggml, llama : use defaulted constructors/destructors (#17649 )	2025-12-03 07:12:18 +01:00
llama-model.h	model : Qwen3 Next (#16095 )	2025-11-28 12:02:56 +01:00
llama-quant.cpp	llama : remove quantization sanity check (#17788 )	2025-12-06 12:26:20 +01:00
llama-quant.h	llama : refactor `src/llama.cpp` (#10902 )	2025-01-03 10:18:53 +02:00
llama-sampling.cpp	common : more accurate sampling timing (#17382 )	2025-11-20 13:40:10 +02:00
llama-sampling.h	llama : add `llama_vocab`, functions -> methods, naming (#11110 )	2025-01-12 11:32:42 +02:00
llama-vocab.cpp	ggml, llama : use defaulted constructors/destructors (#17649 )	2025-12-03 07:12:18 +01:00
llama-vocab.h	model : add AfmoeForCausalLM support (#16477 )	2025-11-14 13:54:10 +01:00
llama.cpp	llama-quant: add support for mmproj (#16592 )	2025-10-15 14:48:08 +02:00
unicode-data.cpp	server : better security control for public deployments (#9776 )	2024-10-08 13:27:04 +02:00
unicode-data.h	llama : reduce compile time and binary size (#9712 )	2024-10-02 15:49:55 +02:00
unicode.cpp	fix: prevent segfault in tokenizer on highly repetitive input (#17786 )	2025-12-05 13:52:23 +02:00
unicode.h	devops: add s390x & ppc64le CI (#15925 )	2025-09-27 02:03:33 +08:00