Commit Graph

139 Commits

Author SHA1 Message Date
Saba Fallah 6978c37fe6 Merge remote-tracking branch 'sfallah/master' into sf/deepseek-ocr 2026-02-02 12:09:28 +01:00
Saba Fallah a94c241751 merge resolved
- fixed issues in convert
- tested several deepseek models
2026-02-02 12:07:35 +01:00
tc-mb ec6c7421e4
mtmd: support MiniCPM-o 4.5(vision only) (#19211)
Signed-off-by: tc-mb <caitianchi@modelbest.cn>
2026-01-30 23:19:30 +01:00
Saba Fallah ded92076a8 Merge remote-tracking branch 'sfallah/master' into sf/deepseek-ocr
# Conflicts:
#	convert_hf_to_gguf.py
#	gguf-py/gguf/gguf_writer.py
#	gguf-py/gguf/tensor_mapping.py
#	src/llama-model.cpp
#	src/models/deepseek2.cpp
#	tools/mtmd/CMakeLists.txt
#	tools/mtmd/clip-impl.h
#	tools/mtmd/clip.cpp
#	tools/mtmd/clip.h
2026-01-28 13:39:39 +01:00
Piotr Wilkin (ilintar) d98b548120
Restore clip's cb() to its rightful glory - extract common debugging elements in llama (#17914)
* Extract common debugging functions; plug eval-callback and mtmd's MTMD_DEBUG_GRAPH with same functionality

* Move to common

* Remove unneeded header

* Unlink from common

* chore: update webui build output

* Cleanup; properly pass params to mtmd without depending on common; factorize debug.cpp to use common debug code.

* Revert change to webapp

* Post-merge adjust

* Apply suggestions from code review

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

* Apply code review changes

* Remove changes to server-context

* Remove mtmd.h include

* Remove utility functions from header

* Apply suggestions from code review

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

* Rename functions

* Update tools/mtmd/clip.cpp

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

* Update tools/mtmd/clip.cpp

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

* Update tools/mtmd/clip.cpp

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

---------

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2026-01-14 20:29:35 +01:00
Xuan-Son Nguyen e047f9ee9d
mtmd: fix use_non_causal being reported incorrectly (#18793)
* mtmd: fix use_non_causal being reported incorrectly

* move clip_is_mrope to mtmd_decode_use_mrope

* fix sloppy code ggml_cpy
2026-01-13 12:19:38 +01:00
Simranjeet Singh a61c8bc3bf
mtmd: Add Gemma3n multimodal support with MobileNetV5 vision encoder (#18256)
* Add Gemma3nVisionModel - MobileNetV5 vision encoder convertor to convert_hf_to_gguf.py. Add gemma3n to vision projectors in gguf-py/gguf/constants.py.

* Add mobilenetv5 impl

* Fix comments, remove unused vars

* Fix permute and remove transpose of projection weights

* Fix comments, remove debugging prints from hf_to_gguf

* 1. Hard-code image_mean = 0 and image_std = 1
2. Use available tensor mapping logic
3. Remove redundant chat template replacement of soft tokens placeholder with media placeholder

* 1. Move mobilenetv5 helpers declarations to `clip_graph_mobilenetv5` struct and definitions to mobilenetv5.cpp
2.Remove unused `clip_is_gemma3n` func declarations and definitions
3. Remove redundant `rescale_image_u8_to_f32` func and use `normalize_image_u8_to_f32` with zero mean and unit std
4. Calculate n_patches using image_size / patch_size

* Remove obsolete comments

* - convert_hf_to_gguf.py & constants.py & tensor_mapping.py: Use explicit mapping: Custom map for double indexed blocks and tensor_mapping.py for rest
- convert_hf_to_gguf.py: Unsqueeze Stem Bias and Layer scale tensors to correct shape while converting to gguf
- mobilenetv5.cpp: Remove explicit reshaping of Stem Bias and Layer scale which are now handled while converting to gguf, replace fprintf with LOG_*
- clip.cpp: Remove unused embedding and hard_emb_norm tensor loading

* - Rename tensors to v.conv..., v.blk..., v.msfa... to better align with already existing terminology

* Fix stem conv bias name

* Remove explicit handling of bias term for stem conv

* - Change order of addition in "project_per_layer_inputs" to support broadcasting of vision inp_per_layer
- Simplify the vision embeddings path of "get_per_layer_inputs" to output [n_embd_altup, n_layer, 1], broadcastable

* clean up conversion script

* fix code style

* also preserve audio tensors

* trailing space

* split arch A and V

* rm unused gemma3 func

* fix alignment

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2026-01-09 23:42:38 +01:00
Tarek Dakhran 4974bf53cf
model : mtmd : make input norm optional in LFM2-VL (#18594)
Upcoming LFM2-VL releases will have configurable input norm.
See https://github.com/huggingface/transformers/pull/43087 for details.
2026-01-04 18:50:02 +01:00
tt ced765be44
model: support youtu-vl model (#18479)
* Support Youtu-VL Model

* merge code

* fix bug

* revert qwen2 code & support rsplit in minja.hpp

* update warm info

* fix annotation

* u

* revert minja.hpp

* fix

* Do not write routed_scaling_factor to gguf when routed_scaling_factor is None

* fix expert_weights_scale

* LGTM after whitespace fixes

* fix

* fix

* fix

* layers to layer_index

* enum fix

---------

Co-authored-by: Xuan-Son Nguyen <son@huggingface.co>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-01-01 19:25:54 +01:00
Henry147147 9b8329de7a
mtmd : Adding support for Nvidia Music Flamingo Model (#18470)
* Inital commit, debugging q5_k_s quant

* Made hf_to_gguf extend whisper to reduce code duplication

* addressed convert_hf_to_gguf pull request issue

---------

Co-authored-by: Henry D <henrydorsey147@gmail.com>
2025-12-31 12:13:23 +01:00
Saba Fallah 4d91711e5c fixed merge build issue 2025-12-19 11:14:36 +01:00
Saba Fallah 9a05e1d116
Merge branch 'master' into sf/deepseek-ocr 2025-12-19 11:08:29 +01:00
Xuan-Son Nguyen 8ea958d4d9
model : add ASR support for LFM2-Audio-1.5B (conformer) (#18106)
* ASR with LFM2-Audio-1.5B

* Set rope_theta

* Fix comment

* Remove rope_theta setting

* Address PR feedback

* rename functions to conformer

* remove some redundant ggml_cont

* fix missing tensor

* add prefix "a." for conv tensors

* remove redundant reshape

* clean up

* add test model

---------

Co-authored-by: Tarek Dakhran <tarek@liquid.ai>
2025-12-19 00:18:01 +01:00
bluebread 5a741fda55 mtmd: format code 2025-12-17 03:26:38 +00:00
Saba Fallah 512b2c8fe4 merge with changes from https://github.com/ggml-org/llama.cpp/pull/18042 2025-12-16 14:07:04 +01:00
Saba Fallah 51c3de6887 Merge remote-tracking branch 'sfallah/master' into sf/deepseek-ocr
# Conflicts:
#	gguf-py/gguf/constants.py
#	gguf-py/gguf/tensor_mapping.py
#	tools/mtmd/clip-impl.h
#	tools/mtmd/clip.cpp
#	tools/mtmd/models/models.h
2025-12-16 12:16:25 +01:00
Xuan-Son Nguyen 3d86c6c2b5
model: support GLM4V vision encoder (#18042)
* convert ok

* no deepstack

* less new tensors

* cgraph ok

* add mrope for text model

* faster patch merger

* add GGML_ROPE_TYPE_MRNORM

* add support for metal

* move glm4v do dedicated graph

* convert: add norm_embd

* clip: add debugging fn

* working correctly

* fix style

* use bicubic

* fix mrope metal

* improve cpu

* convert to neox ordering on conversion

* revert backend changes

* force stop if using old weight

* support moe variant

* fix conversion

* fix convert (2)

* Update tools/mtmd/clip-graph.h

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* process mrope_section on TextModel base class

* resolve conflict merge

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-12-16 11:25:26 +01:00
Saba Fallah 4a4f82968c
Merge branch 'ggml-org:master' into sf/deepseek-ocr 2025-12-16 09:09:52 +01:00
Xuan-Son Nguyen 96a181a933
mtmd: refactor audio preprocessing (#17978)
* mtmd: refactor audio preprocessing

* refactor

Co-authored-by: Tarek <tdakhran@users.noreply.github.com>

* wip

* wip (2)

* improve constructor

* fix use_natural_log

* fix padding for short input

* clean up

* remove need_chunking

---------

Co-authored-by: Tarek <tdakhran@users.noreply.github.com>
2025-12-15 14:16:52 +01:00
Saba Fallah b3bf8cba05 Merge remote-tracking branch 'sfallah/master' into sf/deepseek-ocr
# Conflicts:
#	convert_hf_to_gguf.py
2025-12-15 10:19:50 +01:00
piDack 745fa0e78b
model : add glm-asr support (#17901)
* [model] add glm-asr support

* fix format for ci

* fix convert format for ci

* update glm_asr convert script & use build_ffn for glm_asr clip & use build_stack for padding and review

* check root architecture for convert hf script

* fix conficlt with upstream

* fix convert script for glm asr & format clip-impl

* format

* restore hparams text

* improved conversion

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-12-15 03:18:46 +01:00
Haowei Wu 37f5a1093b
mtmd: enhance image resizing in llava_uhd (#18014) 2025-12-14 15:57:52 +01:00
Saba Fallah 6c36c03815 minor formatting fixes 2025-12-14 15:14:32 +01:00
Saba Fallah f95a6fe9f3 quick and (potential) dirty merge with https://github.com/ggml-org/llama.cpp/pull/17909 2025-12-13 13:52:46 +01:00
Saba Fallah e0e69fd3fb Merge remote-tracking branch 'sfallah/master' into sf/deepseek-ocr-merge_#17965
# Conflicts:
#	src/llama-kv-cache.cpp
#	tools/mtmd/clip.cpp
2025-12-13 10:59:46 +01:00
Xuan-Son Nguyen e39a2ce66d
clip: move model cgraphs into their own files (#17965)
* clip: move model cgraphs into their own files

* more explicit enums

* fix linux build

* fix naming

* missing headers

* nits: add comments for contributors
2025-12-12 21:14:48 +01:00
Saba Fallah d70f171fac merge with changes from https://github.com/ggml-org/llama.cpp/pull/17909
added new opt to tests.sh to disable flash-attn
2025-12-11 10:11:27 +01:00
Saba Fallah 33fabf0bd8 Merge branch 'master' into sf/deepseek-ocr-merge-test
# Conflicts:
#	tools/mtmd/clip.cpp
#	tools/mtmd/mtmd-cli.cpp
2025-12-11 08:13:50 +01:00
Xuan-Son Nguyen c6b2c9310c
mtmd: some small clean up (#17909)
* clip: add support for fused qkv in build_vit

* use bulid_ffn whenever possible

* fix internvl

* mtmd-cli: move image to beginning

* test script: support custom args
2025-12-10 22:20:06 +01:00
Saba Fallah ed944cd25b fix: test-1.jpg ORC issue with small (640) resolution
setting min-resolution base (1024) max large (1280) for dynamic-resolution
2025-12-10 20:20:55 +01:00
Georgi Gerganov 4dff236a52
ggml : remove GGML_KQ_MASK_PAD constant (#17910)
* ggml : remove GGML_KQ_MASK_PAD constant

* cont : remove comment
2025-12-10 20:53:16 +02:00
bluebread 5174a1e69a mtmd: minor fix 2025-12-08 04:54:19 +00:00
bluebread 48c6cf2132 mtmd: convert model in FP16 2025-12-08 02:36:00 +00:00
bluebread 53273f83f8 mtmd: fixed wrong input setting 2025-12-07 23:58:22 +00:00
bluebread 5dfcc5abb1 mtmd: add detailed comments for resize_bicubic_pillow 2025-12-07 10:15:09 +00:00
bluebread 2d918b3e21 mtmd: make sam hparams configurable 2025-12-06 06:55:53 +00:00
bluebread 15f2ada0ed mtmd: simplify get_rel_pos 2025-12-06 06:32:41 +00:00
Saba Fallah d981f19e9d minor editorconfig-check fixes 2025-12-05 13:18:15 +01:00
Saba Fallah 5f2ee1aecf
Merge branch 'ggml-org:master' into sf/deepseek-ocr 2025-12-05 11:56:06 +01:00
Saba Fallah f5bd310a5e minor formatting and style 2025-12-05 09:30:58 +01:00
Saba Fallah 076138a428 corrected code-branch when flash-attn disabled
enabling usage of --flash-attn option
2025-12-04 23:45:59 +01:00
Saba Fallah 5381b9cf63 using common build_attn in sam 2025-12-04 23:13:29 +01:00
bluebread fc3f625fef mtmd: support combined QKV projection in buid_vit 2025-12-04 17:57:43 +00:00
Saba Fallah a661c52990 reverting automatically removed spaces 2025-12-04 16:12:41 +01:00
Saba Fallah c73748ab5d Merge branch 'sf/deepseek-ocr' into sf/deepseek-ocr-cleanup
# Conflicts:
#	gguf-py/gguf/tensor_mapping.py
2025-12-04 15:09:32 +01:00
Saba Fallah 386ba479a2 clean up 2025-12-04 15:05:58 +01:00
bluebread 7451b84105 mtmd: fix tensor names for image newlines and view separator 2025-12-04 13:26:53 +00:00
bluebread b26b507c4e mtmd: refactor code & remove unused helper functions 2025-12-03 16:23:46 +00:00
bluebread b696c54756 mtmd: remove --dsocr-mode argument 2025-12-03 14:54:16 +00:00
bluebread 43dfc0c8d6 Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into sf/deepseek-ocr 2025-12-03 07:52:26 +00:00