Saba Fallah
e0e69fd3fb
Merge remote-tracking branch 'sfallah/master' into sf/deepseek-ocr-merge_#17965
...
# Conflicts:
# src/llama-kv-cache.cpp
# tools/mtmd/clip.cpp
2025-12-13 10:59:46 +01:00
Georgi Gerganov
7bed317f53
models : fix the attn_factor for mistral3 graphs + improve consistency ( #17945 )
...
* models : fix the attn_factor for mistral3 graphs
* cont : rework attn_factor correction logic
* cont : make deepseek2 consistent
* cont : add TODO
* cont : special-case DSv2
* cont : revert Mistral 3 Large changes
* cont : fix DS2 to use the original attn_factor
* cont : minor comments
2025-12-12 17:12:40 +02:00
Saba Fallah
33fabf0bd8
Merge branch 'master' into sf/deepseek-ocr-merge-test
...
# Conflicts:
# tools/mtmd/clip.cpp
# tools/mtmd/mtmd-cli.cpp
2025-12-11 08:13:50 +01:00
Xuan-Son Nguyen
9e79b0116e
convert: allow using quantized Mistral weight ( #17889 )
...
* convert: allow using quantized Mistral weight
* data_torch.ndim
* update dequant fn
Co-authored-by: compilade <compilade@users.noreply.github.com>
---------
Co-authored-by: compilade <compilade@users.noreply.github.com>
2025-12-10 10:26:22 +01:00
philip-essential
1d2a1ab73d
model : support Rnj-1 ( #17811 )
...
* add support for rnj1
* refactor gemma3 to support rnj-1
* address review comments
2025-12-09 04:49:03 +01:00
bluebread
48c6cf2132
mtmd: convert model in FP16
2025-12-08 02:36:00 +00:00
Xuan-Son Nguyen
dbc15a7967
convert: support Mistral 3 Large MoE ( #17730 )
...
* convert: support Mistral 3 Large MoE
* filter out vision tensors, add missing keys
* handle vocab
* add temperature_length
* fix mscale_all_dim
* clean up
* Apply suggestions from code review
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* fix
* Update gguf-py/gguf/tensor_mapping.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-12-06 10:49:33 +01:00
bluebread
2d918b3e21
mtmd: make sam hparams configurable
2025-12-06 06:55:53 +00:00
bluebread
15f2ada0ed
mtmd: simplify get_rel_pos
2025-12-06 06:32:41 +00:00
Saba Fallah
1c88647ec6
fixed flake8 lint issues
2025-12-05 12:24:10 +01:00
Saba Fallah
5f2ee1aecf
Merge branch 'ggml-org:master' into sf/deepseek-ocr
2025-12-05 11:56:06 +01:00
bluebread
2dd9924076
Merge branch 'sf/deepseek-ocr-cleanup' of github.com:sfallah/llama.cpp into sf/deepseek-ocr-cleanup
2025-12-04 16:52:00 +00:00
bluebread
c89171cf4d
mtmd: fixed bad ocr check in Deepseek2 (LM)
2025-12-04 16:50:05 +00:00
Saba Fallah
386ba479a2
clean up
2025-12-04 15:05:58 +01:00
SmartestWashingMachine
3659aa28e9
convert: use existing local chat_template if mistral-format model has one. ( #17749 )
...
* conversion: use existing local chat_template.jinja file if mistral-format model has one.
* fix --mistral-format mistakenly assuming some <=v7 chat template names are file paths and reading them.
* Update convert_hf_to_gguf.py - change from exists() to is_file()
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-12-04 12:12:45 +01:00
Saba Fallah
66341666fb
Merge branch 'master' into sf/deepseek-ocr
...
# Conflicts:
# convert_hf_to_gguf.py
# tools/mtmd/clip.h
# tools/mtmd/mtmd.cpp
2025-12-02 21:02:13 +01:00
Xuan-Son Nguyen
2c453c6c77
convert: add error message for mistral3 quantized weight ( #17686 )
2025-12-02 11:48:31 +01:00
Xuan-Son Nguyen
cd3c118908
model: support Ministral3 ( #17644 )
...
* conversion script
* support ministral 3
* maybe this is better?
* add TODO for rope_yarn_log_mul
* better ppl (tested on 14B-Instruct)
* Add Ministral3 support to Mistral format
* improve arch handling
* add sizes
* Apply suggestions from code review
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* nits
---------
Co-authored-by: Julien Denize <julien.denize@mistral.ai>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-12-01 12:26:52 +01:00
bluebread
55430945ef
Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into sf/deepseek-ocr
2025-11-30 08:55:29 +00:00
Saba Fallah
ed3b7f1056
Merge remote-tracking branch 'sfallah/master' into sf/deepseek-ocr
...
# Conflicts:
# convert_hf_to_gguf.py
# src/llama-model.cpp
# src/models/deepseek2.cpp
2025-11-30 08:29:09 +01:00
bluebread
a488b495f7
mtmd: SAM numerically works
2025-11-29 02:17:49 +00:00
Piotr Wilkin (ilintar)
ff55414c42
model : Qwen3 Next ( #16095 )
...
* Qwen3 Next - cleaned up version
* Whitespaces and stuff
* Correct minor errors
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Misc. fixes.
* Clean up code, add missing hybrid qualifier
* Did someone transpose the SOLVE_TRI result matrix? Perhaps...
* Whitespace
* Proper tensors for cb calls
* Use llama-graph.h vertical alignment
* BROKEN: chunking
* Set new tensors as inputs.
* Proper chunk logic
* It's the circle of life...
* More shenanigans for n_seq > 1
* Nail in the coffin?
* Fix Windows build
* Eh, one fails on Windows, the other fails on Mac... just use general capture.
* quant : cleanup
* model : cleanup
* qwen3 : cleanup
* cont : cleanup
* cont : cleanup
* ggml : revert change
* qwen3 : cleanup
* cont : cleanup
* Readd cmath
* qwen3 : fix typo
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Usual suspects
* fix my bad suggestion
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-11-28 12:02:56 +01:00
Aleksei Nikiforov
05872ac885
convert : fix big-endian conversion ( #17431 )
...
* Fix convert_hf_to_gguf.py script on s390x
Assume converted model data is originally little-endian.
Byteswap data on s390x after reading it to put values in correct presentation
for any transformation needed, like calculating weight tensors.
Then byteswap data to little-endian before passing it to GGUFWriter while
GGUFWriter will byteswap data back to big endian if big endian output is requested.
byteswap(inplace=True) calls don't work with lazy tensor and array wrappers.
Use byteswap with copying data to workaround this behaviour.
* Make GGUFWriter accept tensors in native endianness instead of little-endian
With this change if no byteswapping is actually needed, 2 excessive byteswaps can be omitted on s390x
* Fix byteswapping in convert_hf_to_gguf.py for remote models
2025-11-25 14:18:16 +01:00
Sigbjørn Skjæret
b61de2b2df
convert : allow quantizing lora again ( #17453 )
2025-11-24 15:50:55 +01:00
william pan
4902eebe33
models : Added support for RND1 Diffusion Language Model ( #17433 )
...
* Converted RND1 model to GGUF weights
* RND1 llama.cpp support v1
* RND1 llama.cpp support v2 non causal bug
* RND1 llama.cpp support v3 doccumentation
* RND1 llama.cpp support v4 clean code
* linting issues
* RND1 pr fixes v1
* RND1 pr fixes v2
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Diffusion documentation edits
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-11-24 14:16:56 +08:00
Sigbjørn Skjæret
07b0e7a5ac
convert : use self.block_count everywhere instead of reading hparams ( #17359 )
2025-11-19 11:52:38 +01:00
Saba Fallah
1e08157134
clip-vit: model convert qkv_proj split
2025-11-17 21:19:51 +01:00
Saba Fallah
13dc6fb305
Merge branch 'sf/deepseek-ocr' into sf/deepseek-ocr
2025-11-17 11:25:16 +01:00
Saba Fallah
97e0907c5b
loading LM
...
testing Vision model loading
2025-11-17 11:07:33 +01:00
bluebread
76305878d5
mtmd: successfully runs DeepSeek-OCR LM in llama-cli
2025-11-16 08:45:08 +00:00
Sigbjørn Skjæret
662192e1dc
convert : remove unnecessary chat template patching ( #17289 )
2025-11-15 20:58:59 +01:00
bluebread
eab28ed318
mtmd: add DeepSeek-OCR LM support with standard attention
2025-11-15 17:28:18 +00:00
Sigbjørn Skjæret
9a8860cf5d
convert : use all parts in safetensors index ( #17286 )
2025-11-15 14:12:39 +01:00
Sigbjørn Skjæret
9d3ef4809f
convert : set expert gating func in base class ( #17279 )
2025-11-15 14:06:24 +01:00
bluebread
85c7cda8eb
mtmd: fix vision model processing
2025-11-15 04:20:01 +00:00
Bartowski
e1fcf8b09b
model : add AfmoeForCausalLM support ( #16477 )
...
* Add AFMOE model support
* Update to vocab
* Add model sizing
* Undo Rope change for ARCEE model
* Address review comments
* Update modeling code is_sliding -> use_rope, replace hard-coded logic
* Fix AFMOE tokenizer
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update AFMoE tokenizer class identification to be more unique
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-11-14 13:54:10 +01:00
Saba Fallah
43a130b4d0
mtmd: llama.cpp DeepSeekOCR support
...
init commit
2025-11-14 12:40:20 +01:00
levkropp
2fc392ce35
convert : register UMT5Model architecture for T5 conversion ( #17160 )
...
Register UMT5Model as a supported architecture variant for T5 model conversion.
This allows the conversion to work for models downloaded with AutoModel.
2025-11-11 09:38:30 +01:00
compilade
802cef44bf
convert : parse safetensors directly ( #15667 )
...
* convert : parse safetensors directly
* gguf-py : order safetensors tensors by name
Applies to both local and remote safetensors custom parsing.
This matches the behavior of the official safetensors implementation.
* convert : rename from_safetensors_meta to from_local_tensor
For consistency with from_remote_tensor
* convert : fix no-lazy dtypes from direct safetensors
2025-11-09 09:49:40 -05:00
compilade
1c07c0c68c
convert : handle compressed-tensors quant method ( #17069 )
...
* convert : handle compressed-tensors quant method
* convert : handle int-quantized models
* convert : handle naive-quantized models
* gguf-py : __pos__ is also unary
* convert : fix flake8 lint
* convert : use F32 for dequant of pack-quantized tensors
2025-11-09 09:45:50 -05:00
Li Pengzhan
9f052478c2
model : add openPangu-Embedded ( #16941 )
...
* Model: add openPangu-Embedded
* fixed according to reviewer's comments
* fixed the chat template check condition
* Apply suggestions from code review
change the chat-template check condition and some formatting issue
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* whitespace cleanup
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-11-05 10:28:58 +01:00
Zhiyong Wang
6b9a52422b
model: add Janus Pro for image understanding ( #16906 )
...
* Add support for Janus Pro
* Update gguf-py/gguf/tensor_mapping.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update gguf-py/gguf/tensor_mapping.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Address reviewer suggestions
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Add JANUS_PRO constant
* Update clip model handling
Co-authored-by: Xuan-Son Nguyen <son@huggingface.co>
* Update tools/mtmd/clip.cpp
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
* Refactor JANUS_PRO handling in clip.cpp
Co-authored-by: Xuan-Son Nguyen <son@huggingface.co>
* Update tools/mtmd/clip.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* em whitespace
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Xuan-Son Nguyen <son@huggingface.co>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2025-11-02 22:08:04 +01:00
Piotr Wilkin (ilintar)
0de0a01576
model : Minimax M2 ( #16831 )
...
* Model: Minimax M2
* Cleanup
* Cleanup pt. 2
* Cleanup pt. 3
* Update convert_hf_to_gguf_update.py - merge catch blocks
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Remove vocab models and test
* Remove all redundant hparam settings covered by TextModel
* Move super to start, don't set block_count
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update gguf-py/gguf/constants.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-10-31 21:20:47 +01:00
JJJYmmm
d261223d24
model: add support for qwen3vl series ( #16780 )
...
* support qwen3vl series.
Co-authored-by: Thireus ☠ <Thireus@users.noreply.github.com>
Co-authored-by: yairpatch <yairpatch@users.noreply.github.com>
Co-authored-by: LETS-BEE <LETS-BEE@users.noreply.github.com>
* bugfix: fix the arch check for qwen3vl-moe.
* use build_ffn
* optimize deepstack structure
* optimize deepstack feature saving
* Revert "optimize deepstack feature saving" for temporal fix
This reverts commit f321b9fdf1 .
* code clean
* use fused qkv in clip
* clean up / rm is_deepstack_layers for simplification
* add test model
* move test model to "big" section
* fix imrope check
* remove trailing whitespace
* fix rope fail
* metal : add imrope support
* add imrope support for sycl
* vulkan: add imrope w/o check
* fix vulkan
* webgpu: add imrope w/o check
* Update gguf-py/gguf/tensor_mapping.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* fix tensor mapping
---------
Co-authored-by: Thireus ☠ <Thireus@users.noreply.github.com>
Co-authored-by: yairpatch <yairpatch@users.noreply.github.com>
Co-authored-by: LETS-BEE <LETS-BEE@users.noreply.github.com>
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-10-30 16:19:14 +01:00
Tianyue-Zhao
bacddc049a
model: Add support for CogVLM model ( #15002 )
...
* Added GGUF mappings for CogVLM model
* Add tensor mapping for CogVLM visual encoder
* Add CogVLM to conversion script, no vision part yet
* Added CogVLM vision model to conversion script
* Add graph for CogVLM CLIP model
* Add graph for CogVLM
* Fixes for CogVLM. Now compiles.
* Model now runs
* Fixes for cogvlm graph
* Account for graph context change after rebase
* Changes for whitespace
* Changes in convert script according to comments
* Switch CogVLM LLM graph to merged QKV tensor
* Use rope_type variable instead of direct definition
* Change CogVLM CLIP encoder to use SWIGLU
* Switch CogVLM CLIP to use merged QKV
* Apply rebase edits and remove ggml_cont call that is now unnecessary
* clean up
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-10-30 12:18:50 +01:00
Xuan-Son Nguyen
c55d53acec
model : add LightOnOCR-1B model ( #16764 )
...
* model : add LightOnOCR-1B model
* add test
2025-10-27 16:02:58 +01:00
Sigbjørn Skjæret
73a48c9790
convert : enable expert group selection for all models with it ( #16691 )
2025-10-26 17:21:23 +01:00
Galunid
5d195f17bc
convert : handle mmproj filename/path properly ( #16760 )
...
* convert: handle mmproj model output filename properly
* remove redundant commits
* Add model_type to gguf utility
* Use mmproj- prefix instead of suffix
* Apply CISC suggestion
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-10-25 20:41:36 +02:00
compilade
5cca2542ac
convert : avoid dequantizing mxfp4 for GPT-OSS ( #16756 )
2025-10-24 20:52:00 -04:00
compilade
f8f071fadd
convert : handle pre-quantized models ( #14810 )
...
* convert : begin handling pre-quantized models
* convert : fix conversion from FP8 for Deepseek-V3.1-Base
2025-10-23 16:31:41 -04:00