llama.cpp

History

Mikko Juola 9ae4143bc6 model : add dots.llm1 architecture support (#14044 ) (#14118 ) Adds: * Dots1Model to convert_hf_to_gguf.py * Computation graph code to llama-model.cpp * Chat template to llama-chat.cpp to detect this model's template. --- The model is called "dots.llm1" (I decided to shorten it to dots1 or DOTS1 in the code generally) architecture. The only models that exist as of writing of this commit that follow this architecture are "dots.llm1.inst" and "dots.llm1.base" from here: * https://huggingface.co/rednote-hilab/dots.llm1.inst * https://huggingface.co/rednote-hilab/dots.llm1.base The model architecture is a combination of Qwen and Deepseek parts, as seen here: `ffe12627b4/src/transformers/models/dots1/modular_dots1.py`		2025-06-15 09:52:06 +02:00
..
CMakeLists.txt	memory : migrate from llama_kv_cache to more generic llama_memory (#14006 )	2025-06-05 15:29:22 +03:00
llama-adapter.cpp	llama : do not crash if there is no CPU backend (#13395 )	2025-05-09 13:02:07 +02:00
llama-adapter.h	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )	2025-03-13 12:35:44 +02:00
llama-arch.cpp	model : add dots.llm1 architecture support (#14044 ) (#14118 )	2025-06-15 09:52:06 +02:00
llama-arch.h	model : add dots.llm1 architecture support (#14044 ) (#14118 )	2025-06-15 09:52:06 +02:00
llama-batch.cpp	cparams : rename LLAMA_MAX_PARALLEL_SEQUENCES to LLAMA_MAX_SEQ (#14188 )	2025-06-15 10:08:58 +03:00
llama-batch.h	batch : auto-gen positions + verify multi-sequence input (#14177 )	2025-06-15 09:18:37 +03:00
llama-chat.cpp	model : add dots.llm1 architecture support (#14044 ) (#14118 )	2025-06-15 09:52:06 +02:00
llama-chat.h	model : add dots.llm1 architecture support (#14044 ) (#14118 )	2025-06-15 09:52:06 +02:00
llama-context.cpp	cparams : rename LLAMA_MAX_PARALLEL_SEQUENCES to LLAMA_MAX_SEQ (#14188 )	2025-06-15 10:08:58 +03:00
llama-context.h	batch : rework llama_batch_allocr (#14153 )	2025-06-13 13:47:55 +03:00
llama-cparams.cpp	cparams : rename LLAMA_MAX_PARALLEL_SEQUENCES to LLAMA_MAX_SEQ (#14188 )	2025-06-15 10:08:58 +03:00
llama-cparams.h	cparams : rename LLAMA_MAX_PARALLEL_SEQUENCES to LLAMA_MAX_SEQ (#14188 )	2025-06-15 10:08:58 +03:00
llama-grammar.cpp	`server`: streaming of tool calls and thoughts when `--jinja` is on (#12379 )	2025-05-25 01:48:08 +01:00
llama-grammar.h	`tool-call`: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars (#12034 )	2025-03-05 13:05:13 +00:00
llama-graph.cpp	batch : rework llama_batch_allocr (#14153 )	2025-06-13 13:47:55 +03:00
llama-graph.h	batch : rework llama_batch_allocr (#14153 )	2025-06-13 13:47:55 +03:00
llama-hparams.cpp	hparams : initialize arrays (#13728 )	2025-05-23 20:16:13 +03:00
llama-hparams.h	llama : add RobertaForSequenceClassification reranker support (#13875 )	2025-05-29 08:15:01 +02:00
llama-impl.cpp	GGUF: C++ refactor, backend support, misc fixes (#11030 )	2025-01-07 18:01:58 +01:00
llama-impl.h	cleanup: fix compile warnings associated with gnu_printf (#11811 )	2025-02-12 10:06:53 -04:00
llama-io.cpp	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )	2025-03-13 12:35:44 +02:00
llama-io.h	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )	2025-03-13 12:35:44 +02:00
llama-kv-cache-recurrent.cpp	batch : remove logits_all flag (#14141 )	2025-06-12 11:49:26 +03:00
llama-kv-cache-recurrent.h	batch : remove logits_all flag (#14141 )	2025-06-12 11:49:26 +03:00
llama-kv-cache-unified-iswa.cpp	batch : remove logits_all flag (#14141 )	2025-06-12 11:49:26 +03:00
llama-kv-cache-unified-iswa.h	batch : remove logits_all flag (#14141 )	2025-06-12 11:49:26 +03:00
llama-kv-cache-unified.cpp	cparams : rename LLAMA_MAX_PARALLEL_SEQUENCES to LLAMA_MAX_SEQ (#14188 )	2025-06-15 10:08:58 +03:00
llama-kv-cache-unified.h	batch : remove logits_all flag (#14141 )	2025-06-12 11:49:26 +03:00
llama-kv-cells.h	cparams : rename LLAMA_MAX_PARALLEL_SEQUENCES to LLAMA_MAX_SEQ (#14188 )	2025-06-15 10:08:58 +03:00
llama-memory.cpp	kv-cache : refactor the update/defrag mechanism (#13988 )	2025-06-04 18:58:20 +03:00
llama-memory.h	batch : remove logits_all flag (#14141 )	2025-06-12 11:49:26 +03:00
llama-mmap.cpp	llama : allow using mmap without PrefetchVirtualMemory, apply GGML_WIN_VER to llama.cpp sources (#14013 )	2025-06-05 11:57:42 +02:00
llama-mmap.h	llama-mmap: fix missing include (#11796 )	2025-02-10 20:58:18 +02:00
llama-model-loader.cpp	llama : support multiple classifier outputs and labels (#13940 )	2025-06-06 09:03:25 +02:00
llama-model-loader.h	llama : add option to override model tensor buffers (#11397 )	2025-04-02 14:52:01 +02:00
llama-model-saver.cpp	llama/ggml: add LLM training support (#10544 )	2025-05-12 14:44:49 +02:00
llama-model-saver.h	llama/ggml: add LLM training support (#10544 )	2025-05-12 14:44:49 +02:00
llama-model.cpp	model : add dots.llm1 architecture support (#14044 ) (#14118 )	2025-06-15 09:52:06 +02:00
llama-model.h	model : add dots.llm1 architecture support (#14044 ) (#14118 )	2025-06-15 09:52:06 +02:00
llama-quant.cpp	quantize : improve tensor-type pattern matching (#13033 )	2025-05-13 19:12:31 +02:00
llama-quant.h	llama : refactor `src/llama.cpp` (#10902 )	2025-01-03 10:18:53 +02:00
llama-sampling.cpp	sampling : make sure samplers return at least 1 token (#13822 )	2025-05-27 12:07:52 +03:00
llama-sampling.h	llama : add `llama_vocab`, functions -> methods, naming (#11110 )	2025-01-12 11:32:42 +02:00
llama-vocab.cpp	vocab : fix build (#14175 )	2025-06-13 20:03:05 +03:00
llama-vocab.h	llama/ggml: add LLM training support (#10544 )	2025-05-12 14:44:49 +02:00
llama.cpp	llama : print hint when loading a model when no backends are loaded (#13589 )	2025-05-16 16:38:07 +02:00
unicode-data.cpp	server : better security control for public deployments (#9776 )	2024-10-08 13:27:04 +02:00
unicode-data.h	llama : reduce compile time and binary size (#9712 )	2024-10-02 15:49:55 +02:00
unicode.cpp	repo : update links to new url (#11886 )	2025-02-15 16:40:57 +02:00
unicode.h	unicode : improve naming style (#10838 )	2024-12-16 12:31:45 +02:00