llama.cpp/src/models
Saba Fallah a970515bdb
mtmd: Add DeepSeekOCR Support (#17400)
* mtmd: llama.cpp DeepSeekOCR support
init commit

* loading sam tensors

* mtmd: fix vision model processing

* deepseek-ocr clip-vit model impl

* mtmd: add DeepSeek-OCR LM support with standard attention

* mtmd: successfully runs DeepSeek-OCR LM in llama-cli

* mtmd: Fix RoPE type for DeepSeek-OCR LM.

* loading LM
testing Vision model loading

* sam warmup working

* sam erroneous return corrected

* clip-vit:  corrected cls_embd concat

* clip-vit: model convert  qkv_proj split

* corrected combining of image encoders' results

* fix: update callback for ffn_moe_weighted and add callback for attn_out in deepseek2 model

* concat image_newline and image_seperator tokens

* visual_model warmup (technically) works

* window partitioning using standard ggml ops

* sam implementation without using CPU only ops

* clip: fixed warnings

* Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into sf/deepseek-ocr

* mtmd: fix get_rel_pos

* mtmd: fixed the wrong scaler for get_rel_pos

* image encoding technically works but the output can't be checked singe image decoding fails

* mtmd: minor changed

* mtmd: add native resolution support

* - image encoding debugged
- issues fixed mainly related wrong config like n_patches etc.
- configs need to be corrected in the converter

* mtmd: correct token order

* - dynamic resizing
- changes are concerning PR https://github.com/sfallah/llama.cpp/pull/4

* mtmd: quick fix token order

* mtmd: fix danling pointer

* mtmd: SAM numerically works

* mtmd: debug CLIP-L (vit_pre_ln)

* mtmd: debug CLIP-L & first working DeepSeek-OCR model

* mtmd : add --dsocr-mode CLI argument for DeepSeek-OCR resolution control & all native resolution modes work

* mtmd: simplify SAM patch embedding

* mtmd: adapt Pillow image resizing function

* mtmd:  simplify DeepSeek-OCR dynamic resolution preprocessing

* mtmd: remove --dsocr-mode argument

* mtmd: refactor code & remove unused helper functions

* mtmd: fix tensor names for image newlines and view separator

* clean up

* reverting automatically removed spaces

* reverting automatically removed spaces

* mtmd: fixed bad ocr check in Deepseek2 (LM)

* mtmd: support combined QKV projection in buid_vit

* using common build_attn in sam

* corrected code-branch when flash-attn disabled
enabling usage of --flash-attn option

* mtmd: minor fix

* minor formatting and style

* fixed flake8 lint issues

* minor editorconfig-check fixes

* minor editorconfig-check fixes

* mtmd: simplify get_rel_pos

* mtmd: make sam hparams configurable

* mtmd: add detailed comments for resize_bicubic_pillow

* mtmd: fixed wrong input setting

* mtmd: convert model in FP16

* mtmd: minor fix

* mtmd: remove tweak to llama-mtmd-cli & deepseek-ocr template

* fix: test-1.jpg ORC issue with small (640) resolution
setting min-resolution base (1024) max large (1280) for dynamic-resolution

* minor: editconfig-check fix

* merge with changes from https://github.com/ggml-org/llama.cpp/pull/17909
added new opt to tests.sh to disable flash-attn

* minor: editconfig-check fix

* testing deepseek-ocr
quick and dirty test script comparing results of Qwen2.5-VL vs DeepSeek-OCR

* quick and (potential) dirty merge with https://github.com/ggml-org/llama.cpp/pull/17909

* refactoring, one single builder function and static helpers

* added deepseek-ocr test to tests.sh

* minor formatting fixes

* check with fixed expected resutls

* minor formatting

* editorconfig-check fix

* merge with changes from https://github.com/ggml-org/llama.cpp/pull/18042

* minor
- added GLM-4.6V to big tests
- added missing deps for python test

* convert: minor fix

* mtmd: format code

* convert: quick fix

* convert: quick fix

* minor python formatting

* fixed merge build issue

* merge resolved
- fixed issues in convert
- tested several deepseek models

* minor fix

* minor

* Update convert_hf_to_gguf.py

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* - removed clip_is_deepseekocr
- removed redundant RESIZE_ALGO_BICUBIC_PILLOW resize-algo
- simplified image-preprocessing
- removed/simplified debug functions

* - cleaning commented out code

* fixing instabilities issues reintroducing resize_bicubic_pillow

* - use f16 model for deepseek-ocr test
- ignore llama-arch test for deepseek-ocr

* rename fc_w --> mm_fc_w

* add links to OCR discussion

* cleaner loading code

* add missing .weight to some tensors

* add default jinja template (to be used by server)

* move test model to ggml-org

* rolling back upscale change

* Update convert_hf_to_gguf.py

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: bluebread <hotbread70127@gmail.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2026-03-25 19:57:40 +01:00
..
afmoe.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
apertus.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
arcee.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
arctic.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
arwkv7.cpp
baichuan.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
bailingmoe.cpp graph : remove redundant scale_w parameter (#20235) 2026-03-08 18:58:28 +01:00
bailingmoe2.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
bert.cpp models : move the token embedding norms to the first layer (#20943) 2026-03-24 17:00:30 +02:00
bitnet.cpp model : add control vector support where missing (#20653) 2026-03-18 23:25:12 +01:00
bloom.cpp models : move the token embedding norms to the first layer (#20943) 2026-03-24 17:00:30 +02:00
chameleon.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
chatglm.cpp model : add control vector support where missing (#20653) 2026-03-18 23:25:12 +01:00
codeshell.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
cogvlm.cpp model : add control vector support where missing (#20653) 2026-03-18 23:25:12 +01:00
cohere2-iswa.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
command-r.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
dbrx.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
deci.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
deepseek.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
deepseek2.cpp mtmd: Add DeepSeekOCR Support (#17400) 2026-03-25 19:57:40 +01:00
delta-net-base.cpp graph : remove redundant GDN state transposes (#20443) 2026-03-13 22:12:54 +02:00
dots1.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
dream.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
ernie4-5-moe.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
ernie4-5.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
eurobert.cpp model : add control vector support where missing (#20653) 2026-03-18 23:25:12 +01:00
exaone-moe.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
exaone.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
exaone4.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
falcon-h1.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
falcon.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
gemma-embedding.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
gemma.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
gemma2-iswa.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
gemma3.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
gemma3n-iswa.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
glm4-moe.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
glm4.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
gpt2.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
gptneox.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
granite-hybrid.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
granite.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
grok.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
grovemoe.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
hunyuan-dense.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
hunyuan-moe.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
internlm2.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
jais.cpp model : add control vector support where missing (#20653) 2026-03-18 23:25:12 +01:00
jais2.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
jamba.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
kimi-linear.cpp model : add control vector support where missing (#20653) 2026-03-18 23:25:12 +01:00
lfm2.cpp model : add control vector support where missing (#20653) 2026-03-18 23:25:12 +01:00
llada-moe.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
llada.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
llama-iswa.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
llama.cpp graph : add optional scale parameter to build_lora_mm [no ci] (#20427) 2026-03-12 00:22:49 +01:00
maincoder.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
mamba-base.cpp model : wire up Nemotron-H tensors for NVFP4 support (#20561) 2026-03-16 09:19:16 +01:00
mamba.cpp models : deduplicate delta-net graphs for Qwen family (#19597) 2026-02-16 14:35:04 +02:00
mimo2-iswa.cpp graph : remove redundant scale_w parameter (#20235) 2026-03-08 18:58:28 +01:00
minicpm3.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
minimax-m2.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
mistral3.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
models.h llama : enable chunked fused GDN path (#20340) 2026-03-11 22:46:40 +02:00
modern-bert.cpp models : move the token embedding norms to the first layer (#20943) 2026-03-24 17:00:30 +02:00
mpt.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
nemotron-h.cpp model : wire up Nemotron-H tensors for NVFP4 support (#20561) 2026-03-16 09:19:16 +01:00
nemotron.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
neo-bert.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
olmo.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
olmo2.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
olmoe.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
openai-moe-iswa.cpp graph : remove redundant scale_w parameter (#20235) 2026-03-08 18:58:28 +01:00
openelm.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
orion.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
paddleocr.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
pangu-embedded.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
phi2.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
phi3.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
plamo.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
plamo2.cpp model : add control vector support where missing (#20653) 2026-03-18 23:25:12 +01:00
plamo3.cpp model : add control vector support where missing (#20653) 2026-03-18 23:25:12 +01:00
plm.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
qwen.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
qwen2.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
qwen2moe.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
qwen2vl.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
qwen3.cpp graph : add optional scale parameter to build_lora_mm [no ci] (#20427) 2026-03-12 00:22:49 +01:00
qwen3moe.cpp graph : add optional scale parameter to build_lora_mm [no ci] (#20427) 2026-03-12 00:22:49 +01:00
qwen3next.cpp model : add control vector support where missing (#20653) 2026-03-18 23:25:12 +01:00
qwen3vl-moe.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
qwen3vl.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
qwen35.cpp model : add control vector support where missing (#20653) 2026-03-18 23:25:12 +01:00
qwen35moe.cpp model : add control vector support where missing (#20653) 2026-03-18 23:25:12 +01:00
refact.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
rnd1.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
rwkv6-base.cpp models : deduplicate delta-net graphs for Qwen family (#19597) 2026-02-16 14:35:04 +02:00
rwkv6.cpp models : move the token embedding norms to the first layer (#20943) 2026-03-24 17:00:30 +02:00
rwkv6qwen2.cpp
rwkv7-base.cpp models : deduplicate delta-net graphs for Qwen family (#19597) 2026-02-16 14:35:04 +02:00
rwkv7.cpp models : move the token embedding norms to the first layer (#20943) 2026-03-24 17:00:30 +02:00
seed-oss.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
smallthinker.cpp model : add control vector support where missing (#20653) 2026-03-18 23:25:12 +01:00
smollm3.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
stablelm.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
starcoder.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
starcoder2.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
step35-iswa.cpp model : add control vector support where missing (#20653) 2026-03-18 23:25:12 +01:00
t5-dec.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
t5-enc.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00
wavtokenizer-dec.cpp models : move the token embedding norms to the first layer (#20943) 2026-03-24 17:00:30 +02:00
xverse.cpp llama: dynamic head_dim and n_rot for SWA (#20301) 2026-03-09 22:22:39 +01:00