llama.cpp

History

Saba Fallah a970515bdb mtmd: Add DeepSeekOCR Support (#17400 ) * mtmd: llama.cpp DeepSeekOCR support init commit * loading sam tensors * mtmd: fix vision model processing * deepseek-ocr clip-vit model impl * mtmd: add DeepSeek-OCR LM support with standard attention * mtmd: successfully runs DeepSeek-OCR LM in llama-cli * mtmd: Fix RoPE type for DeepSeek-OCR LM. * loading LM testing Vision model loading * sam warmup working * sam erroneous return corrected * clip-vit: corrected cls_embd concat * clip-vit: model convert qkv_proj split * corrected combining of image encoders' results * fix: update callback for ffn_moe_weighted and add callback for attn_out in deepseek2 model * concat image_newline and image_seperator tokens * visual_model warmup (technically) works * window partitioning using standard ggml ops * sam implementation without using CPU only ops * clip: fixed warnings * Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into sf/deepseek-ocr * mtmd: fix get_rel_pos * mtmd: fixed the wrong scaler for get_rel_pos * image encoding technically works but the output can't be checked singe image decoding fails * mtmd: minor changed * mtmd: add native resolution support * - image encoding debugged - issues fixed mainly related wrong config like n_patches etc. - configs need to be corrected in the converter * mtmd: correct token order * - dynamic resizing - changes are concerning PR https://github.com/sfallah/llama.cpp/pull/4 * mtmd: quick fix token order * mtmd: fix danling pointer * mtmd: SAM numerically works * mtmd: debug CLIP-L (vit_pre_ln) * mtmd: debug CLIP-L & first working DeepSeek-OCR model * mtmd : add --dsocr-mode CLI argument for DeepSeek-OCR resolution control & all native resolution modes work * mtmd: simplify SAM patch embedding * mtmd: adapt Pillow image resizing function * mtmd: simplify DeepSeek-OCR dynamic resolution preprocessing * mtmd: remove --dsocr-mode argument * mtmd: refactor code & remove unused helper functions * mtmd: fix tensor names for image newlines and view separator * clean up * reverting automatically removed spaces * reverting automatically removed spaces * mtmd: fixed bad ocr check in Deepseek2 (LM) * mtmd: support combined QKV projection in buid_vit * using common build_attn in sam * corrected code-branch when flash-attn disabled enabling usage of --flash-attn option * mtmd: minor fix * minor formatting and style * fixed flake8 lint issues * minor editorconfig-check fixes * minor editorconfig-check fixes * mtmd: simplify get_rel_pos * mtmd: make sam hparams configurable * mtmd: add detailed comments for resize_bicubic_pillow * mtmd: fixed wrong input setting * mtmd: convert model in FP16 * mtmd: minor fix * mtmd: remove tweak to llama-mtmd-cli & deepseek-ocr template * fix: test-1.jpg ORC issue with small (640) resolution setting min-resolution base (1024) max large (1280) for dynamic-resolution * minor: editconfig-check fix * merge with changes from https://github.com/ggml-org/llama.cpp/pull/17909 added new opt to tests.sh to disable flash-attn * minor: editconfig-check fix * testing deepseek-ocr quick and dirty test script comparing results of Qwen2.5-VL vs DeepSeek-OCR * quick and (potential) dirty merge with https://github.com/ggml-org/llama.cpp/pull/17909 * refactoring, one single builder function and static helpers * added deepseek-ocr test to tests.sh * minor formatting fixes * check with fixed expected resutls * minor formatting * editorconfig-check fix * merge with changes from https://github.com/ggml-org/llama.cpp/pull/18042 * minor - added GLM-4.6V to big tests - added missing deps for python test * convert: minor fix * mtmd: format code * convert: quick fix * convert: quick fix * minor python formatting * fixed merge build issue * merge resolved - fixed issues in convert - tested several deepseek models * minor fix * minor * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * - removed clip_is_deepseekocr - removed redundant RESIZE_ALGO_BICUBIC_PILLOW resize-algo - simplified image-preprocessing - removed/simplified debug functions * - cleaning commented out code * fixing instabilities issues reintroducing resize_bicubic_pillow * - use f16 model for deepseek-ocr test - ignore llama-arch test for deepseek-ocr * rename fc_w --> mm_fc_w * add links to OCR discussion * cleaner loading code * add missing .weight to some tensors * add default jinja template (to be used by server) * move test model to ggml-org * rolling back upscale change * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: bluebread <hotbread70127@gmail.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> Co-authored-by: Xuan Son Nguyen <son@huggingface.co> Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>		2026-03-25 19:57:40 +01:00
..
afmoe.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
apertus.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
arcee.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
arctic.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
arwkv7.cpp	…
baichuan.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
bailingmoe.cpp	graph : remove redundant scale_w parameter (#20235 )	2026-03-08 18:58:28 +01:00
bailingmoe2.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
bert.cpp	models : move the token embedding norms to the first layer (#20943 )	2026-03-24 17:00:30 +02:00
bitnet.cpp	model : add control vector support where missing (#20653 )	2026-03-18 23:25:12 +01:00
bloom.cpp	models : move the token embedding norms to the first layer (#20943 )	2026-03-24 17:00:30 +02:00
chameleon.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
chatglm.cpp	model : add control vector support where missing (#20653 )	2026-03-18 23:25:12 +01:00
codeshell.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
cogvlm.cpp	model : add control vector support where missing (#20653 )	2026-03-18 23:25:12 +01:00
cohere2-iswa.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
command-r.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
dbrx.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
deci.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
deepseek.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
deepseek2.cpp	mtmd: Add DeepSeekOCR Support (#17400 )	2026-03-25 19:57:40 +01:00
delta-net-base.cpp	graph : remove redundant GDN state transposes (#20443 )	2026-03-13 22:12:54 +02:00
dots1.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
dream.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
ernie4-5-moe.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
ernie4-5.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
eurobert.cpp	model : add control vector support where missing (#20653 )	2026-03-18 23:25:12 +01:00
exaone-moe.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
exaone.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
exaone4.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
falcon-h1.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
falcon.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
gemma-embedding.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
gemma.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
gemma2-iswa.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
gemma3.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
gemma3n-iswa.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
glm4-moe.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
glm4.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
gpt2.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
gptneox.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
granite-hybrid.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
granite.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
grok.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
grovemoe.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
hunyuan-dense.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
hunyuan-moe.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
internlm2.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
jais.cpp	model : add control vector support where missing (#20653 )	2026-03-18 23:25:12 +01:00
jais2.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
jamba.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
kimi-linear.cpp	model : add control vector support where missing (#20653 )	2026-03-18 23:25:12 +01:00
lfm2.cpp	model : add control vector support where missing (#20653 )	2026-03-18 23:25:12 +01:00
llada-moe.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
llada.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
llama-iswa.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
llama.cpp	graph : add optional scale parameter to build_lora_mm [no ci] (#20427 )	2026-03-12 00:22:49 +01:00
maincoder.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
mamba-base.cpp	model : wire up Nemotron-H tensors for NVFP4 support (#20561 )	2026-03-16 09:19:16 +01:00
mamba.cpp	models : deduplicate delta-net graphs for Qwen family (#19597 )	2026-02-16 14:35:04 +02:00
mimo2-iswa.cpp	graph : remove redundant scale_w parameter (#20235 )	2026-03-08 18:58:28 +01:00
minicpm3.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
minimax-m2.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
mistral3.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
models.h	llama : enable chunked fused GDN path (#20340 )	2026-03-11 22:46:40 +02:00
modern-bert.cpp	models : move the token embedding norms to the first layer (#20943 )	2026-03-24 17:00:30 +02:00
mpt.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
nemotron-h.cpp	model : wire up Nemotron-H tensors for NVFP4 support (#20561 )	2026-03-16 09:19:16 +01:00
nemotron.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
neo-bert.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
olmo.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
olmo2.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
olmoe.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
openai-moe-iswa.cpp	graph : remove redundant scale_w parameter (#20235 )	2026-03-08 18:58:28 +01:00
openelm.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
orion.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
paddleocr.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
pangu-embedded.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
phi2.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
phi3.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
plamo.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
plamo2.cpp	model : add control vector support where missing (#20653 )	2026-03-18 23:25:12 +01:00
plamo3.cpp	model : add control vector support where missing (#20653 )	2026-03-18 23:25:12 +01:00
plm.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
qwen.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
qwen2.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
qwen2moe.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
qwen2vl.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
qwen3.cpp	graph : add optional scale parameter to build_lora_mm [no ci] (#20427 )	2026-03-12 00:22:49 +01:00
qwen3moe.cpp	graph : add optional scale parameter to build_lora_mm [no ci] (#20427 )	2026-03-12 00:22:49 +01:00
qwen3next.cpp	model : add control vector support where missing (#20653 )	2026-03-18 23:25:12 +01:00
qwen3vl-moe.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
qwen3vl.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
qwen35.cpp	model : add control vector support where missing (#20653 )	2026-03-18 23:25:12 +01:00
qwen35moe.cpp	model : add control vector support where missing (#20653 )	2026-03-18 23:25:12 +01:00
refact.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
rnd1.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
rwkv6-base.cpp	models : deduplicate delta-net graphs for Qwen family (#19597 )	2026-02-16 14:35:04 +02:00
rwkv6.cpp	models : move the token embedding norms to the first layer (#20943 )	2026-03-24 17:00:30 +02:00
rwkv6qwen2.cpp	…
rwkv7-base.cpp	models : deduplicate delta-net graphs for Qwen family (#19597 )	2026-02-16 14:35:04 +02:00
rwkv7.cpp	models : move the token embedding norms to the first layer (#20943 )	2026-03-24 17:00:30 +02:00
seed-oss.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
smallthinker.cpp	model : add control vector support where missing (#20653 )	2026-03-18 23:25:12 +01:00
smollm3.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
stablelm.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
starcoder.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
starcoder2.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
step35-iswa.cpp	model : add control vector support where missing (#20653 )	2026-03-18 23:25:12 +01:00
t5-dec.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
t5-enc.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00
wavtokenizer-dec.cpp	models : move the token embedding norms to the first layer (#20943 )	2026-03-24 17:00:30 +02:00
xverse.cpp	llama: dynamic head_dim and n_rot for SWA (#20301 )	2026-03-09 22:22:39 +01:00