llama.cpp

Commit Graph

Author	SHA1	Message	Date
bluebread	e20857ba59	mtmd: simplify DeepSeek-OCR dynamic resolution preprocessing	2025-12-03 07:51:12 +00:00
bluebread	c914e05405	mtmd: adapt Pillow image resizing function	2025-12-03 05:18:39 +00:00
bluebread	95239f92b9	mtmd: simplify SAM patch embedding	2025-12-01 07:31:24 +00:00
bluebread	c5f4c64fe4	mtmd : add --dsocr-mode CLI argument for DeepSeek-OCR resolution control & all native resolution modes work	2025-11-30 16:57:19 +00:00
bluebread	55430945ef	Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into sf/deepseek-ocr	2025-11-30 08:55:29 +00:00
Saba Fallah	ed3b7f1056	Merge remote-tracking branch 'sfallah/master' into sf/deepseek-ocr # Conflicts: # convert_hf_to_gguf.py # src/llama-model.cpp # src/models/deepseek2.cpp	2025-11-30 08:29:09 +01:00
bluebread	841a4a88df	mtmd: debug CLIP-L & first working DeepSeek-OCR model	2025-11-29 16:40:50 +00:00
bluebread	ccb2f2385e	mtmd: debug CLIP-L (vit_pre_ln)	2025-11-29 07:04:14 +00:00
bluebread	a488b495f7	mtmd: SAM numerically works	2025-11-29 02:17:49 +00:00
Han Qingzhe	1d594c295c	clip: (minicpmv) fix resampler kq_scale (#17516 ) * debug:"solve minicpmv precision problem" * “debug minicpmv” * Apply suggestion from @ngxson --------- Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>	2025-11-26 21:44:07 +01:00
Saba Fallah	206f8abc3c	- dynamic resizing - changes are concerning PR https://github.com/sfallah/llama.cpp/pull/4	2025-11-23 20:27:02 +01:00
Saba Fallah	6dfda99c69	Merge branch 'sf/deepseek-ocr' into sf/deepseek-ocr	2025-11-23 12:29:37 +01:00
Saba Fallah	4cfa15fcd7	- image encoding debugged - issues fixed mainly related wrong config like n_patches etc. - configs need to be corrected in the converter	2025-11-22 16:57:34 +01:00
bluebread	ee8a1488f9	mtmd: add native resolution support	2025-11-22 15:48:13 +00:00
Saba Fallah	3fcfc3ace9	Merge pull request #3 from bluebread/sf/deepseek-ocr Fixed get_rel_pos & add_rel_pos_inplace operator	2025-11-22 09:33:15 +01:00
bluebread	f8f66a151b	Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into sf/deepseek-ocr	2025-11-22 02:22:48 +00:00
bluebread	effe66958e	mtmd: minor changed	2025-11-22 02:09:37 +00:00
Saba Fallah	86f111f8b7	image encoding technically works but the output can't be checked singe image decoding fails	2025-11-21 20:42:14 +01:00
bluebread	7b8d735c90	mtmd: fixed the wrong scaler for get_rel_pos	2025-11-21 18:04:01 +00:00
bluebread	7e9fbeccc5	mtmd: fix get_rel_pos	2025-11-21 17:12:12 +00:00
bluebread	5e6cf3c6a8	Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into sf/deepseek-ocr	2025-11-21 15:36:45 +00:00
bluebread	8bce66d5f2	clip: fixed warnings	2025-11-21 15:28:37 +00:00
Saba Fallah	68b206b65c	sam implementation without using CPU only ops	2025-11-21 15:29:39 +01:00
Saba Fallah	88032f46b1	window partitioning using standard ggml ops	2025-11-20 10:07:54 +01:00
Saba Fallah	89afda8da9	visual_model warmup (technically) works	2025-11-18 10:26:32 +01:00
Saba Fallah	63a042f21e	concat image_newline and image_seperator tokens	2025-11-18 09:43:11 +01:00
Saba Fallah	331cea8f8e	corrected combining of image encoders' results	2025-11-18 05:59:37 +01:00
Saba Fallah	8b3d319c03	clip-vit: corrected cls_embd concat	2025-11-17 20:57:51 +01:00
Saba Fallah	cec9a5c6e0	sam erroneous return corrected	2025-11-17 18:59:40 +01:00
Saba Fallah	790bbb97d8	sam warmup working	2025-11-17 15:27:00 +01:00
Saba Fallah	97e0907c5b	loading LM testing Vision model loading	2025-11-17 11:07:33 +01:00
Saba Fallah	2aab52e2c4	deepseek-ocr clip-vit model impl	2025-11-15 15:30:07 +01:00
Saba Fallah	b6b9f02c8a	loading sam tensors	2025-11-14 20:51:48 +01:00
Xuan-Son Nguyen	9b17d74ab7	mtmd: add mtmd_log_set (#17268 )	2025-11-14 15:56:19 +01:00
Saba Fallah	43a130b4d0	mtmd: llama.cpp DeepSeekOCR support init commit	2025-11-14 12:40:20 +01:00
Xuan-Son Nguyen	4b13a684c5	mtmd: fix patch_size initialized to random value in audio models (#17128 ) * mtmd: fix patch_size initialized to random value in audio models * add default hparams	2025-11-10 11:41:05 +01:00
Xuan-Son Nguyen	4882f0ff78	clip: implement minicpm-v sinusoidal embd using GGML (#17036 ) * clip: implement minicpm-v sinusoidal embd using GGML * fix repeat op	2025-11-06 11:02:54 +01:00
Xuan-Son Nguyen	92bb84f775	mtmd: allow QwenVL to process larger image by default (#17020 )	2025-11-05 14:26:49 +01:00
Xuan-Son Nguyen	2f0c2db43e	mtmd: improve struct initialization (#16981 )	2025-11-05 11:26:37 +01:00
Xuan-Son Nguyen	070ff4d535	mtmd: add --image-min/max-tokens (#16921 )	2025-11-03 11:11:18 +01:00
Xuan-Son Nguyen	bf7b0c9725	mtmd: pad mask for qwen2.5vl (#16954 ) * mtmd: pad mask for qwen2.5vl * improve	2025-11-03 10:25:55 +01:00
Zhiyong Wang	6b9a52422b	model: add Janus Pro for image understanding (#16906 ) * Add support for Janus Pro * Update gguf-py/gguf/tensor_mapping.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update gguf-py/gguf/tensor_mapping.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Address reviewer suggestions Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Add JANUS_PRO constant * Update clip model handling Co-authored-by: Xuan-Son Nguyen <son@huggingface.co> * Update tools/mtmd/clip.cpp Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> * Refactor JANUS_PRO handling in clip.cpp Co-authored-by: Xuan-Son Nguyen <son@huggingface.co> * Update tools/mtmd/clip.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * em whitespace --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> Co-authored-by: Xuan-Son Nguyen <son@huggingface.co> Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>	2025-11-02 22:08:04 +01:00
Georgi Gerganov	2f966b8ed8	clip : use FA (#16837 ) * clip : use FA * cont : add warning about unsupported ops * implement "auto" mode for clip flash attn * clip : print more detailed op support info during warmup * cont : remove obsolete comment [no ci] * improve debugging message * trailing space * metal : remove stray return --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-11-02 21:21:48 +01:00
Xuan-Son Nguyen	cf659bbb8e	mtmd: refactor preprocessing + support max/min pixels (#16878 ) * mtmd: refactor preprocessing + support max/min pixels * fix mlp type * implement mix/max pixels * improve hparams * better image preproc for qwen * fix * fix out of bound composite * fix (2) * fix token calculation * get_merge_kernel_size() * fix llama4 and lfm2 * gonna fix them all * use simple resize for qwen * qwen: increase min tokens * no resize if dst size == src size * restore to initial min/max tokens value for qwen	2025-11-01 15:51:36 +01:00
JJJYmmm	d261223d24	model: add support for qwen3vl series (#16780 ) * support qwen3vl series. Co-authored-by: Thireus ☠ <Thireus@users.noreply.github.com> Co-authored-by: yairpatch <yairpatch@users.noreply.github.com> Co-authored-by: LETS-BEE <LETS-BEE@users.noreply.github.com> * bugfix: fix the arch check for qwen3vl-moe. * use build_ffn * optimize deepstack structure * optimize deepstack feature saving * Revert "optimize deepstack feature saving" for temporal fix This reverts commit `f321b9fdf1`. * code clean * use fused qkv in clip * clean up / rm is_deepstack_layers for simplification * add test model * move test model to "big" section * fix imrope check * remove trailing whitespace * fix rope fail * metal : add imrope support * add imrope support for sycl * vulkan: add imrope w/o check * fix vulkan * webgpu: add imrope w/o check * Update gguf-py/gguf/tensor_mapping.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * fix tensor mapping --------- Co-authored-by: Thireus ☠ <Thireus@users.noreply.github.com> Co-authored-by: yairpatch <yairpatch@users.noreply.github.com> Co-authored-by: LETS-BEE <LETS-BEE@users.noreply.github.com> Co-authored-by: Xuan Son Nguyen <son@huggingface.co> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-10-30 16:19:14 +01:00
Tianyue-Zhao	bacddc049a	model: Add support for CogVLM model (#15002 ) * Added GGUF mappings for CogVLM model * Add tensor mapping for CogVLM visual encoder * Add CogVLM to conversion script, no vision part yet * Added CogVLM vision model to conversion script * Add graph for CogVLM CLIP model * Add graph for CogVLM * Fixes for CogVLM. Now compiles. * Model now runs * Fixes for cogvlm graph * Account for graph context change after rebase * Changes for whitespace * Changes in convert script according to comments * Switch CogVLM LLM graph to merged QKV tensor * Use rope_type variable instead of direct definition * Change CogVLM CLIP encoder to use SWIGLU * Switch CogVLM CLIP to use merged QKV * Apply rebase edits and remove ggml_cont call that is now unnecessary * clean up --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-10-30 12:18:50 +01:00
Xuan-Son Nguyen	e1ab084803	mtmd : fix idefics3 preprocessing (#16806 ) * mtmd : fix idefics3 preprocessing * disable granite test * fix test for granite	2025-10-27 23:12:16 +01:00
Xuan-Son Nguyen	c55d53acec	model : add LightOnOCR-1B model (#16764 ) * model : add LightOnOCR-1B model * add test	2025-10-27 16:02:58 +01:00
Xuan-Son Nguyen	1bb4f43380	mtmd : support home-cooked Mistral Small Omni (#14928 )	2025-10-16 19:00:31 +02:00
Gabe Goodhart	ca71fb9b36	model : Granite docling + Idefics3 preprocessing (SmolVLM) (#16206 ) * feat: Add granite-docling conversion using trillion pretokenizer Branch: gabe-l-hart/GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: Add granite-docling vocab pre enum Branch: gabe-l-hart/GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix: Use granite-docling pre Branch: gabe-l-hart/GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: Add clip_is_idefics3 Branch: gabe-l-hart/GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: Allow multi-token boundary sequences for image templating Branch: gabe-l-hart/GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: Add tiling support for idefices3 in clip.cpp This should likely be moved into llava_uhd::get_slice_instructions, but for now this avoids disrupting the logic there. Branch: gabe-l-hart/GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: Partial support for full templating for idefics3 in mtmd There are still errors encoding some of the image chunks, but the token sequence now matches transformers _almost_ perfectly, except for the double newline before the global image which shows up as two consecutive newline tokens instead of a single double-newline token. I think this is happening because the blocks are tokenized separately then concatenated. Branch: gabe-l-hart/GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: Fully working image preprocessing for idefics3 w/ resize and slicing Branch: gabe-l-hart/GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: Parse the preprocessor config's longest side and add it to the mmproj hparams Branch: GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix: Use the longest side instead of size * scale_factor For Granite Docling, these come out to the same value, but that was just a conicidence. Branch: GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix: Allow batch encoding and remove clip_is_idefics3 Branch: GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * refactor: Remove unnecessary conditionals for empty token vectors Branch: GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * refactor: Use image_manipulation util Branch: GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * add test model --------- Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-10-05 14:57:47 +02:00

1 2

84 Commits