llama.cpp/models
Olivier Chafik f5cd27b71d
`server`: streaming of tool calls and thoughts when `--jinja` is on (#12379)
* add common_json w/ support for truncated json healing

* add common_chat_msg_diff

* partial common_chat_parse

* refactor parser w/ optionals

* server: wire chat diffs in stream mode

* fix trigger of thinking models (must happen after thoughts are closed)

* fix functionary v3.2 raw python!

* rename: common_chat_syntax (now contains format)

* rm common_regex.at_start

* don't return empty <think></think>

* accommodate yet another deepseek r1 distill fantasy syntax (`<|tool▁calls|>`)

* fix QwQ 32B tool call parsing after thoughts (hermes2)

* better logs for grammar triggers

* consume spaces after parse_json_tool_calls

* fix required tool calls w/ thinking models that have pre-opened thinking tags

* fix thinking model's initial trigger + test qwq's template

* run most test_tool_call tests in stream + non-stream modes

* make functionary v3.2 parsing more strict (differentiate first match from others)

* send final diff from server, to close off raw python arguments

* support partial content streaming in Generic mode

* tool-call: allow content prelude before hermes2 tool calls (for Qwen2.5)

* Update function-calling.md

* Update tool_bench.py

* chat-parser: remove input from exception (llm output may contain PII)

---------

Co-authored-by: ochafik <ochafik@google.com>
Co-authored-by: Olivier Chafik <ochafik@users.noreply.github.com>
2025-05-25 01:48:08 +01:00
..
templates `server`: streaming of tool calls and thoughts when `--jinja` is on (#12379) 2025-05-25 01:48:08 +01:00
.editorconfig
ggml-vocab-aquila.gguf
ggml-vocab-baichuan.gguf
ggml-vocab-bert-bge.gguf
ggml-vocab-bert-bge.gguf.inp
ggml-vocab-bert-bge.gguf.out
ggml-vocab-chameleon.gguf.inp
ggml-vocab-chameleon.gguf.out
ggml-vocab-command-r.gguf
ggml-vocab-command-r.gguf.inp
ggml-vocab-command-r.gguf.out
ggml-vocab-deepseek-coder.gguf
ggml-vocab-deepseek-coder.gguf.inp
ggml-vocab-deepseek-coder.gguf.out
ggml-vocab-deepseek-llm.gguf
ggml-vocab-deepseek-llm.gguf.inp
ggml-vocab-deepseek-llm.gguf.out
ggml-vocab-deepseek-r1-qwen.gguf.inp llama : add support for Deepseek-R1-Qwen distill model (#11310) 2025-01-20 14:35:07 +01:00
ggml-vocab-deepseek-r1-qwen.gguf.out llama : add support for Deepseek-R1-Qwen distill model (#11310) 2025-01-20 14:35:07 +01:00
ggml-vocab-falcon.gguf
ggml-vocab-falcon.gguf.inp
ggml-vocab-falcon.gguf.out
ggml-vocab-gpt-2.gguf
ggml-vocab-gpt-2.gguf.inp
ggml-vocab-gpt-2.gguf.out
ggml-vocab-gpt-4o.gguf.inp llama : add Phi-4-mini support (supersede #12099) (#12108) 2025-02-28 12:44:11 +01:00
ggml-vocab-gpt-4o.gguf.out llama : add Phi-4-mini support (supersede #12099) (#12108) 2025-02-28 12:44:11 +01:00
ggml-vocab-gpt-neox.gguf
ggml-vocab-llama-bpe.gguf
ggml-vocab-llama-bpe.gguf.inp
ggml-vocab-llama-bpe.gguf.out
ggml-vocab-llama-spm.gguf
ggml-vocab-llama-spm.gguf.inp
ggml-vocab-llama-spm.gguf.out
ggml-vocab-llama4.gguf.inp llama : Support llama 4 text-only (#12791) 2025-04-07 23:06:44 +02:00
ggml-vocab-llama4.gguf.out llama : Support llama 4 text-only (#12791) 2025-04-07 23:06:44 +02:00
ggml-vocab-mpt.gguf
ggml-vocab-mpt.gguf.inp
ggml-vocab-mpt.gguf.out
ggml-vocab-phi-3.gguf
ggml-vocab-phi-3.gguf.inp
ggml-vocab-phi-3.gguf.out
ggml-vocab-pixtral.gguf.inp mtmd : Support Pixtral 12B (#13065) 2025-04-23 20:21:59 +02:00
ggml-vocab-pixtral.gguf.out mtmd : Support Pixtral 12B (#13065) 2025-04-23 20:21:59 +02:00
ggml-vocab-qwen2.gguf
ggml-vocab-qwen2.gguf.inp
ggml-vocab-qwen2.gguf.out
ggml-vocab-refact.gguf
ggml-vocab-refact.gguf.inp
ggml-vocab-refact.gguf.out
ggml-vocab-roberta-bpe.gguf.inp
ggml-vocab-roberta-bpe.gguf.out
ggml-vocab-starcoder.gguf
ggml-vocab-starcoder.gguf.inp
ggml-vocab-starcoder.gguf.out