llama.cpp/src
Mikko Juola 9ae4143bc6
model : add dots.llm1 architecture support (#14044) (#14118)
Adds:

* Dots1Model to convert_hf_to_gguf.py

* Computation graph code to llama-model.cpp

* Chat template to llama-chat.cpp to detect this model's template.

---

The model is called "dots.llm1" (I decided to shorten it to dots1 or
DOTS1 in the code generally) architecture.

The only models that exist as of writing of this commit that follow this
architecture are "dots.llm1.inst" and "dots.llm1.base" from here:

* https://huggingface.co/rednote-hilab/dots.llm1.inst

* https://huggingface.co/rednote-hilab/dots.llm1.base

The model architecture is a combination of Qwen and Deepseek parts, as
seen here:

ffe12627b4/src/transformers/models/dots1/modular_dots1.py
2025-06-15 09:52:06 +02:00
..
CMakeLists.txt memory : migrate from llama_kv_cache to more generic llama_memory (#14006) 2025-06-05 15:29:22 +03:00
llama-adapter.cpp llama : do not crash if there is no CPU backend (#13395) 2025-05-09 13:02:07 +02:00
llama-adapter.h llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181) 2025-03-13 12:35:44 +02:00
llama-arch.cpp model : add dots.llm1 architecture support (#14044) (#14118) 2025-06-15 09:52:06 +02:00
llama-arch.h model : add dots.llm1 architecture support (#14044) (#14118) 2025-06-15 09:52:06 +02:00
llama-batch.cpp cparams : rename LLAMA_MAX_PARALLEL_SEQUENCES to LLAMA_MAX_SEQ (#14188) 2025-06-15 10:08:58 +03:00
llama-batch.h batch : auto-gen positions + verify multi-sequence input (#14177) 2025-06-15 09:18:37 +03:00
llama-chat.cpp model : add dots.llm1 architecture support (#14044) (#14118) 2025-06-15 09:52:06 +02:00
llama-chat.h model : add dots.llm1 architecture support (#14044) (#14118) 2025-06-15 09:52:06 +02:00
llama-context.cpp cparams : rename LLAMA_MAX_PARALLEL_SEQUENCES to LLAMA_MAX_SEQ (#14188) 2025-06-15 10:08:58 +03:00
llama-context.h batch : rework llama_batch_allocr (#14153) 2025-06-13 13:47:55 +03:00
llama-cparams.cpp cparams : rename LLAMA_MAX_PARALLEL_SEQUENCES to LLAMA_MAX_SEQ (#14188) 2025-06-15 10:08:58 +03:00
llama-cparams.h cparams : rename LLAMA_MAX_PARALLEL_SEQUENCES to LLAMA_MAX_SEQ (#14188) 2025-06-15 10:08:58 +03:00
llama-grammar.cpp `server`: streaming of tool calls and thoughts when `--jinja` is on (#12379) 2025-05-25 01:48:08 +01:00
llama-grammar.h `tool-call`: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars (#12034) 2025-03-05 13:05:13 +00:00
llama-graph.cpp batch : rework llama_batch_allocr (#14153) 2025-06-13 13:47:55 +03:00
llama-graph.h batch : rework llama_batch_allocr (#14153) 2025-06-13 13:47:55 +03:00
llama-hparams.cpp hparams : initialize arrays (#13728) 2025-05-23 20:16:13 +03:00
llama-hparams.h llama : add RobertaForSequenceClassification reranker support (#13875) 2025-05-29 08:15:01 +02:00
llama-impl.cpp GGUF: C++ refactor, backend support, misc fixes (#11030) 2025-01-07 18:01:58 +01:00
llama-impl.h cleanup: fix compile warnings associated with gnu_printf (#11811) 2025-02-12 10:06:53 -04:00
llama-io.cpp llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181) 2025-03-13 12:35:44 +02:00
llama-io.h llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181) 2025-03-13 12:35:44 +02:00
llama-kv-cache-recurrent.cpp batch : remove logits_all flag (#14141) 2025-06-12 11:49:26 +03:00
llama-kv-cache-recurrent.h batch : remove logits_all flag (#14141) 2025-06-12 11:49:26 +03:00
llama-kv-cache-unified-iswa.cpp batch : remove logits_all flag (#14141) 2025-06-12 11:49:26 +03:00
llama-kv-cache-unified-iswa.h batch : remove logits_all flag (#14141) 2025-06-12 11:49:26 +03:00
llama-kv-cache-unified.cpp cparams : rename LLAMA_MAX_PARALLEL_SEQUENCES to LLAMA_MAX_SEQ (#14188) 2025-06-15 10:08:58 +03:00
llama-kv-cache-unified.h batch : remove logits_all flag (#14141) 2025-06-12 11:49:26 +03:00
llama-kv-cells.h cparams : rename LLAMA_MAX_PARALLEL_SEQUENCES to LLAMA_MAX_SEQ (#14188) 2025-06-15 10:08:58 +03:00
llama-memory.cpp kv-cache : refactor the update/defrag mechanism (#13988) 2025-06-04 18:58:20 +03:00
llama-memory.h batch : remove logits_all flag (#14141) 2025-06-12 11:49:26 +03:00
llama-mmap.cpp llama : allow using mmap without PrefetchVirtualMemory, apply GGML_WIN_VER to llama.cpp sources (#14013) 2025-06-05 11:57:42 +02:00
llama-mmap.h llama-mmap: fix missing include (#11796) 2025-02-10 20:58:18 +02:00
llama-model-loader.cpp llama : support multiple classifier outputs and labels (#13940) 2025-06-06 09:03:25 +02:00
llama-model-loader.h llama : add option to override model tensor buffers (#11397) 2025-04-02 14:52:01 +02:00
llama-model-saver.cpp llama/ggml: add LLM training support (#10544) 2025-05-12 14:44:49 +02:00
llama-model-saver.h llama/ggml: add LLM training support (#10544) 2025-05-12 14:44:49 +02:00
llama-model.cpp model : add dots.llm1 architecture support (#14044) (#14118) 2025-06-15 09:52:06 +02:00
llama-model.h model : add dots.llm1 architecture support (#14044) (#14118) 2025-06-15 09:52:06 +02:00
llama-quant.cpp quantize : improve tensor-type pattern matching (#13033) 2025-05-13 19:12:31 +02:00
llama-quant.h llama : refactor `src/llama.cpp` (#10902) 2025-01-03 10:18:53 +02:00
llama-sampling.cpp sampling : make sure samplers return at least 1 token (#13822) 2025-05-27 12:07:52 +03:00
llama-sampling.h llama : add `llama_vocab`, functions -> methods, naming (#11110) 2025-01-12 11:32:42 +02:00
llama-vocab.cpp vocab : fix build (#14175) 2025-06-13 20:03:05 +03:00
llama-vocab.h llama/ggml: add LLM training support (#10544) 2025-05-12 14:44:49 +02:00
llama.cpp llama : print hint when loading a model when no backends are loaded (#13589) 2025-05-16 16:38:07 +02:00
unicode-data.cpp server : better security control for public deployments (#9776) 2024-10-08 13:27:04 +02:00
unicode-data.h llama : reduce compile time and binary size (#9712) 2024-10-02 15:49:55 +02:00
unicode.cpp repo : update links to new url (#11886) 2025-02-15 16:40:57 +02:00
unicode.h unicode : improve naming style (#10838) 2024-12-16 12:31:45 +02:00