Commit Graph

106 Commits

Author SHA1 Message Date
bluebread 53273f83f8 mtmd: fixed wrong input setting 2025-12-07 23:58:22 +00:00
bluebread 5dfcc5abb1 mtmd: add detailed comments for resize_bicubic_pillow 2025-12-07 10:15:09 +00:00
bluebread 2d918b3e21 mtmd: make sam hparams configurable 2025-12-06 06:55:53 +00:00
bluebread 15f2ada0ed mtmd: simplify get_rel_pos 2025-12-06 06:32:41 +00:00
Saba Fallah d981f19e9d minor editorconfig-check fixes 2025-12-05 13:18:15 +01:00
Saba Fallah 5f2ee1aecf
Merge branch 'ggml-org:master' into sf/deepseek-ocr 2025-12-05 11:56:06 +01:00
Saba Fallah f5bd310a5e minor formatting and style 2025-12-05 09:30:58 +01:00
Saba Fallah 076138a428 corrected code-branch when flash-attn disabled
enabling usage of --flash-attn option
2025-12-04 23:45:59 +01:00
Saba Fallah 5381b9cf63 using common build_attn in sam 2025-12-04 23:13:29 +01:00
bluebread fc3f625fef mtmd: support combined QKV projection in buid_vit 2025-12-04 17:57:43 +00:00
Saba Fallah a661c52990 reverting automatically removed spaces 2025-12-04 16:12:41 +01:00
Saba Fallah c73748ab5d Merge branch 'sf/deepseek-ocr' into sf/deepseek-ocr-cleanup
# Conflicts:
#	gguf-py/gguf/tensor_mapping.py
2025-12-04 15:09:32 +01:00
Saba Fallah 386ba479a2 clean up 2025-12-04 15:05:58 +01:00
bluebread 7451b84105 mtmd: fix tensor names for image newlines and view separator 2025-12-04 13:26:53 +00:00
bluebread b26b507c4e mtmd: refactor code & remove unused helper functions 2025-12-03 16:23:46 +00:00
bluebread b696c54756 mtmd: remove --dsocr-mode argument 2025-12-03 14:54:16 +00:00
bluebread 43dfc0c8d6 Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into sf/deepseek-ocr 2025-12-03 07:52:26 +00:00
bluebread e20857ba59 mtmd: simplify DeepSeek-OCR dynamic resolution preprocessing 2025-12-03 07:51:12 +00:00
bluebread c914e05405 mtmd: adapt Pillow image resizing function 2025-12-03 05:18:39 +00:00
Xuan-Son Nguyen a96283adc4
mtmd: fix --no-warmup (#17695) 2025-12-02 22:48:08 +01:00
Saba Fallah 66341666fb Merge branch 'master' into sf/deepseek-ocr
# Conflicts:
#	convert_hf_to_gguf.py
#	tools/mtmd/clip.h
#	tools/mtmd/mtmd.cpp
2025-12-02 21:02:13 +01:00
Xuan-Son Nguyen ecf74a8417
mtmd: add mtmd_context_params::warmup option (#17652)
* mtmd: add mtmd_context_params::warmup option

* reuse the common_params::warmup
2025-12-01 21:32:25 +01:00
bluebread 95239f92b9 mtmd: simplify SAM patch embedding 2025-12-01 07:31:24 +00:00
Tarek Dakhran 2ba719519d
model: LFM2-VL fixes (#17577)
* Adjust to pytorch

* Add antialiasing upscale

* Increase number of patches to 1024

* Handle default marker insertion for LFM2

* Switch to flag

* Reformat

* Cuda implementation of antialias kernel

* Change placement in ops.cpp

* consistent float literals

* Pad only for LFM2

* Address PR feedback

* Rollback default marker placement changes

* Fallback to CPU implementation for antialias implementation of upscale
2025-11-30 21:57:31 +01:00
bluebread c5f4c64fe4 mtmd : add --dsocr-mode CLI argument for DeepSeek-OCR resolution control & all native resolution modes work 2025-11-30 16:57:19 +00:00
Xuan-Son Nguyen 7f8ef50cce
clip: fix nb calculation for qwen3-vl (#17594) 2025-11-30 15:33:55 +01:00
bluebread 55430945ef Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into sf/deepseek-ocr 2025-11-30 08:55:29 +00:00
Saba Fallah ed3b7f1056 Merge remote-tracking branch 'sfallah/master' into sf/deepseek-ocr
# Conflicts:
#	convert_hf_to_gguf.py
#	src/llama-model.cpp
#	src/models/deepseek2.cpp
2025-11-30 08:29:09 +01:00
bluebread 841a4a88df mtmd: debug CLIP-L & first working DeepSeek-OCR model 2025-11-29 16:40:50 +00:00
bluebread ccb2f2385e mtmd: debug CLIP-L (vit_pre_ln) 2025-11-29 07:04:14 +00:00
bluebread a488b495f7 mtmd: SAM numerically works 2025-11-29 02:17:49 +00:00
Han Qingzhe 1d594c295c
clip: (minicpmv) fix resampler kq_scale (#17516)
* debug:"solve minicpmv precision problem"

* “debug minicpmv”

* Apply suggestion from @ngxson

---------

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2025-11-26 21:44:07 +01:00
Saba Fallah 206f8abc3c - dynamic resizing
- changes are concerning PR https://github.com/sfallah/llama.cpp/pull/4
2025-11-23 20:27:02 +01:00
Saba Fallah 6dfda99c69
Merge branch 'sf/deepseek-ocr' into sf/deepseek-ocr 2025-11-23 12:29:37 +01:00
Saba Fallah 4cfa15fcd7 - image encoding debugged
- issues fixed mainly related wrong config like n_patches etc.
- configs need to be corrected in the converter
2025-11-22 16:57:34 +01:00
bluebread ee8a1488f9 mtmd: add native resolution support 2025-11-22 15:48:13 +00:00
Saba Fallah 3fcfc3ace9
Merge pull request #3 from bluebread/sf/deepseek-ocr
Fixed get_rel_pos & add_rel_pos_inplace operator
2025-11-22 09:33:15 +01:00
bluebread f8f66a151b Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into sf/deepseek-ocr 2025-11-22 02:22:48 +00:00
bluebread effe66958e mtmd: minor changed 2025-11-22 02:09:37 +00:00
Saba Fallah 86f111f8b7 image encoding technically works but the output can't be checked singe image decoding fails 2025-11-21 20:42:14 +01:00
bluebread 7b8d735c90 mtmd: fixed the wrong scaler for get_rel_pos 2025-11-21 18:04:01 +00:00
bluebread 7e9fbeccc5 mtmd: fix get_rel_pos 2025-11-21 17:12:12 +00:00
bluebread 5e6cf3c6a8 Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into sf/deepseek-ocr 2025-11-21 15:36:45 +00:00
bluebread 8bce66d5f2 clip: fixed warnings 2025-11-21 15:28:37 +00:00
Saba Fallah 68b206b65c sam implementation without using CPU only ops 2025-11-21 15:29:39 +01:00
Saba Fallah 88032f46b1 window partitioning using standard ggml ops 2025-11-20 10:07:54 +01:00
Saba Fallah 89afda8da9 visual_model warmup (technically) works 2025-11-18 10:26:32 +01:00
Saba Fallah 63a042f21e concat image_newline and image_seperator tokens 2025-11-18 09:43:11 +01:00
Saba Fallah 331cea8f8e corrected combining of image encoders' results 2025-11-18 05:59:37 +01:00
Saba Fallah 8b3d319c03 clip-vit: corrected cls_embd concat 2025-11-17 20:57:51 +01:00