llama.cpp

Commit Graph

Author	SHA1	Message	Date
Saba Fallah	6dfda99c69	Merge branch 'sf/deepseek-ocr' into sf/deepseek-ocr	2025-11-23 12:29:37 +01:00
Saba Fallah	4cfa15fcd7	- image encoding debugged - issues fixed mainly related wrong config like n_patches etc. - configs need to be corrected in the converter	2025-11-22 16:57:34 +01:00
bluebread	ee8a1488f9	mtmd: add native resolution support	2025-11-22 15:48:13 +00:00
Saba Fallah	3fcfc3ace9	Merge pull request #3 from bluebread/sf/deepseek-ocr Fixed get_rel_pos & add_rel_pos_inplace operator	2025-11-22 09:33:15 +01:00
bluebread	f8f66a151b	Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into sf/deepseek-ocr	2025-11-22 02:22:48 +00:00
bluebread	effe66958e	mtmd: minor changed	2025-11-22 02:09:37 +00:00
Saba Fallah	86f111f8b7	image encoding technically works but the output can't be checked singe image decoding fails	2025-11-21 20:42:14 +01:00
bluebread	7b8d735c90	mtmd: fixed the wrong scaler for get_rel_pos	2025-11-21 18:04:01 +00:00
bluebread	7e9fbeccc5	mtmd: fix get_rel_pos	2025-11-21 17:12:12 +00:00
bluebread	5e6cf3c6a8	Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into sf/deepseek-ocr	2025-11-21 15:36:45 +00:00
bluebread	8bce66d5f2	clip: fixed warnings	2025-11-21 15:28:37 +00:00
Saba Fallah	68b206b65c	sam implementation without using CPU only ops	2025-11-21 15:29:39 +01:00
Saba Fallah	88032f46b1	window partitioning using standard ggml ops	2025-11-20 10:07:54 +01:00
Saba Fallah	89afda8da9	visual_model warmup (technically) works	2025-11-18 10:26:32 +01:00
Saba Fallah	63a042f21e	concat image_newline and image_seperator tokens	2025-11-18 09:43:11 +01:00
Saba Fallah	331cea8f8e	corrected combining of image encoders' results	2025-11-18 05:59:37 +01:00
Saba Fallah	8b3d319c03	clip-vit: corrected cls_embd concat	2025-11-17 20:57:51 +01:00
Saba Fallah	cec9a5c6e0	sam erroneous return corrected	2025-11-17 18:59:40 +01:00
Saba Fallah	790bbb97d8	sam warmup working	2025-11-17 15:27:00 +01:00
Saba Fallah	97e0907c5b	loading LM testing Vision model loading	2025-11-17 11:07:33 +01:00
Saba Fallah	2aab52e2c4	deepseek-ocr clip-vit model impl	2025-11-15 15:30:07 +01:00
Saba Fallah	b6b9f02c8a	loading sam tensors	2025-11-14 20:51:48 +01:00
Saba Fallah	43a130b4d0	mtmd: llama.cpp DeepSeekOCR support init commit	2025-11-14 12:40:20 +01:00
Xuan-Son Nguyen	4882f0ff78	clip: implement minicpm-v sinusoidal embd using GGML (#17036 ) * clip: implement minicpm-v sinusoidal embd using GGML * fix repeat op	2025-11-06 11:02:54 +01:00
Xuan-Son Nguyen	92bb84f775	mtmd: allow QwenVL to process larger image by default (#17020 )	2025-11-05 14:26:49 +01:00
Xuan-Son Nguyen	2f0c2db43e	mtmd: improve struct initialization (#16981 )	2025-11-05 11:26:37 +01:00
Xuan-Son Nguyen	070ff4d535	mtmd: add --image-min/max-tokens (#16921 )	2025-11-03 11:11:18 +01:00
Xuan-Son Nguyen	bf7b0c9725	mtmd: pad mask for qwen2.5vl (#16954 ) * mtmd: pad mask for qwen2.5vl * improve	2025-11-03 10:25:55 +01:00
Zhiyong Wang	6b9a52422b	model: add Janus Pro for image understanding (#16906 ) * Add support for Janus Pro * Update gguf-py/gguf/tensor_mapping.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update gguf-py/gguf/tensor_mapping.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Address reviewer suggestions Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Add JANUS_PRO constant * Update clip model handling Co-authored-by: Xuan-Son Nguyen <son@huggingface.co> * Update tools/mtmd/clip.cpp Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> * Refactor JANUS_PRO handling in clip.cpp Co-authored-by: Xuan-Son Nguyen <son@huggingface.co> * Update tools/mtmd/clip.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * em whitespace --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> Co-authored-by: Xuan-Son Nguyen <son@huggingface.co> Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>	2025-11-02 22:08:04 +01:00
Georgi Gerganov	2f966b8ed8	clip : use FA (#16837 ) * clip : use FA * cont : add warning about unsupported ops * implement "auto" mode for clip flash attn * clip : print more detailed op support info during warmup * cont : remove obsolete comment [no ci] * improve debugging message * trailing space * metal : remove stray return --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-11-02 21:21:48 +01:00
Xuan-Son Nguyen	cf659bbb8e	mtmd: refactor preprocessing + support max/min pixels (#16878 ) * mtmd: refactor preprocessing + support max/min pixels * fix mlp type * implement mix/max pixels * improve hparams * better image preproc for qwen * fix * fix out of bound composite * fix (2) * fix token calculation * get_merge_kernel_size() * fix llama4 and lfm2 * gonna fix them all * use simple resize for qwen * qwen: increase min tokens * no resize if dst size == src size * restore to initial min/max tokens value for qwen	2025-11-01 15:51:36 +01:00
JJJYmmm	d261223d24	model: add support for qwen3vl series (#16780 ) * support qwen3vl series. Co-authored-by: Thireus ☠ <Thireus@users.noreply.github.com> Co-authored-by: yairpatch <yairpatch@users.noreply.github.com> Co-authored-by: LETS-BEE <LETS-BEE@users.noreply.github.com> * bugfix: fix the arch check for qwen3vl-moe. * use build_ffn * optimize deepstack structure * optimize deepstack feature saving * Revert "optimize deepstack feature saving" for temporal fix This reverts commit `f321b9fdf1`. * code clean * use fused qkv in clip * clean up / rm is_deepstack_layers for simplification * add test model * move test model to "big" section * fix imrope check * remove trailing whitespace * fix rope fail * metal : add imrope support * add imrope support for sycl * vulkan: add imrope w/o check * fix vulkan * webgpu: add imrope w/o check * Update gguf-py/gguf/tensor_mapping.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * fix tensor mapping --------- Co-authored-by: Thireus ☠ <Thireus@users.noreply.github.com> Co-authored-by: yairpatch <yairpatch@users.noreply.github.com> Co-authored-by: LETS-BEE <LETS-BEE@users.noreply.github.com> Co-authored-by: Xuan Son Nguyen <son@huggingface.co> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-10-30 16:19:14 +01:00
Tianyue-Zhao	bacddc049a	model: Add support for CogVLM model (#15002 ) * Added GGUF mappings for CogVLM model * Add tensor mapping for CogVLM visual encoder * Add CogVLM to conversion script, no vision part yet * Added CogVLM vision model to conversion script * Add graph for CogVLM CLIP model * Add graph for CogVLM * Fixes for CogVLM. Now compiles. * Model now runs * Fixes for cogvlm graph * Account for graph context change after rebase * Changes for whitespace * Changes in convert script according to comments * Switch CogVLM LLM graph to merged QKV tensor * Use rope_type variable instead of direct definition * Change CogVLM CLIP encoder to use SWIGLU * Switch CogVLM CLIP to use merged QKV * Apply rebase edits and remove ggml_cont call that is now unnecessary * clean up --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-10-30 12:18:50 +01:00
Xuan-Son Nguyen	e1ab084803	mtmd : fix idefics3 preprocessing (#16806 ) * mtmd : fix idefics3 preprocessing * disable granite test * fix test for granite	2025-10-27 23:12:16 +01:00
Xuan-Son Nguyen	c55d53acec	model : add LightOnOCR-1B model (#16764 ) * model : add LightOnOCR-1B model * add test	2025-10-27 16:02:58 +01:00
Xuan-Son Nguyen	1bb4f43380	mtmd : support home-cooked Mistral Small Omni (#14928 )	2025-10-16 19:00:31 +02:00
Gabe Goodhart	ca71fb9b36	model : Granite docling + Idefics3 preprocessing (SmolVLM) (#16206 ) * feat: Add granite-docling conversion using trillion pretokenizer Branch: gabe-l-hart/GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: Add granite-docling vocab pre enum Branch: gabe-l-hart/GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix: Use granite-docling pre Branch: gabe-l-hart/GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: Add clip_is_idefics3 Branch: gabe-l-hart/GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: Allow multi-token boundary sequences for image templating Branch: gabe-l-hart/GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: Add tiling support for idefices3 in clip.cpp This should likely be moved into llava_uhd::get_slice_instructions, but for now this avoids disrupting the logic there. Branch: gabe-l-hart/GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: Partial support for full templating for idefics3 in mtmd There are still errors encoding some of the image chunks, but the token sequence now matches transformers _almost_ perfectly, except for the double newline before the global image which shows up as two consecutive newline tokens instead of a single double-newline token. I think this is happening because the blocks are tokenized separately then concatenated. Branch: gabe-l-hart/GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: Fully working image preprocessing for idefics3 w/ resize and slicing Branch: gabe-l-hart/GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: Parse the preprocessor config's longest side and add it to the mmproj hparams Branch: GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix: Use the longest side instead of size * scale_factor For Granite Docling, these come out to the same value, but that was just a conicidence. Branch: GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix: Allow batch encoding and remove clip_is_idefics3 Branch: GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * refactor: Remove unnecessary conditionals for empty token vectors Branch: GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * refactor: Use image_manipulation util Branch: GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * add test model --------- Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-10-05 14:57:47 +02:00
Aleksei Nikiforov	cc1cfa277b	mtmd : fix uninitialized variable in bicubic_resize (#16275 ) Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> Co-authored-by: Aaron Teo <aaron.teo1@ibm.com>	2025-09-26 15:00:44 +02:00
Diego Devesa	50f4281a6f	llama : allow using iGPUs with --device (#15951 ) * llama : allow using iGPUs with --device * mtmd : allow iGPU * rpc-server : allow iGPU	2025-09-13 16:49:49 +02:00
Xuan-Son Nguyen	79a546220c	mtmd : support Kimi VL model (#15458 ) * convert : fix tensor naming conflict for llama 4 vision * convert ok * support kimi vision model * clean up * fix style * fix calc number of output tokens * refactor resize_position_embeddings * add test case * rename build fn * correct a small bug	2025-08-26 12:54:19 +02:00
tc-mb	c4e9239064	model : support MiniCPM-V 4.5 (#15575 )	2025-08-26 10:05:55 +02:00
Tarek Dakhran	e288693669	readme : model : mtdm : lfm2 improvements (#15476 ) * Support untied embeddings * Increase number of image tokens to 1024 * Add LFM2-VL to readme * Actually use untied embeddings	2025-08-22 09:29:08 +02:00
Michael Giba	b108e42904	ci : fix -Werror=return-type in clip.cpp so ci/run.sh can run without issue (#15221 ) * Fix -Werror=return-type so ci/run.sh can run * Update tools/mtmd/clip.cpp Co-authored-by: Diego Devesa <slarengh@gmail.com> * Remove false now that we have abort --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>	2025-08-21 12:06:46 +02:00
Xuan-Son Nguyen	f08c4c0d8d	mtmd : clean up clip_n_output_tokens (#15391 )	2025-08-18 22:53:52 +02:00
Sigbjørn Skjæret	baa9255a45	llama : merge conts and reshapes and remove unnecessary cont (#15380 ) * remove unnecessary conts and merge reshapes * restore necessary conts * merge more conts and reshapes * merge even more conts and reshapes	2025-08-18 19:30:17 +02:00
Tarek Dakhran	65349f26f2	model : support vision LiquidAI LFM2-VL family (#15347 ) * wip lfm2 vision model * Fix conv weight * Implement dynamic resolution * Fix cuda * support LFM2-VL-450M * happy CI * Remove extra `ggml_conv` and put others into the right place Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-08-16 23:33:54 +02:00
rainred	cf9e5648a7	mtmd : Fix MinicpmV model converter and clip to avoid using hardcode. (#14750 ) * Fix MinicpmV model converter and clip to avoid using hardcode. * Code update for pr/14750 * Remove unused field, update script path in docs. * Add version 5 for fallback code. --------- Co-authored-by: lzhang <zhanglei@modelbest.cn>	2025-08-11 16:12:12 +02:00
tc-mb	952a47f455	mtmd : support MiniCPM-V 4.0 (#14983 ) * support minicpm-v 4 * add md * support MiniCPM-o 4.0 * add default location * temp rm MiniCPM-o 4.0 * fix code * fix "minicpmv_projector" default path	2025-07-31 17:22:17 +02:00
Xuan-Son Nguyen	00fa15fedc	mtmd : add support for Voxtral (#14862 ) * mtmd : add support for Voxtral * clean up * fix python requirements * add [BEGIN_AUDIO] token * also support Devstral conversion * add docs and tests * fix regression for ultravox * minor coding style improvement * correct project activation fn * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-07-28 15:01:48 +02:00
kiwi	749e0d27f0	mtmd : fix 32-bit narrowing issue in export-lora and mtmd clip (#14503 ) * [fix] Fix 32-bit narrowing issue in export-lora and mtmd clip * Update export-lora.cpp * Update clip.cpp * Update export-lora.cpp * format: use space to replace tab	2025-07-25 13:08:04 +02:00

1 2

71 Commits