bluebread
3f71188303
mtmd: correct token order
2025-11-23 09:22:00 +00:00
Saba Fallah
4cfa15fcd7
- image encoding debugged
...
- issues fixed mainly related wrong config like n_patches etc.
- configs need to be corrected in the converter
2025-11-22 16:57:34 +01:00
Saba Fallah
3fcfc3ace9
Merge pull request #3 from bluebread/sf/deepseek-ocr
...
Fixed get_rel_pos & add_rel_pos_inplace operator
2025-11-22 09:33:15 +01:00
bluebread
effe66958e
mtmd: minor changed
2025-11-22 02:09:37 +00:00
Saba Fallah
86f111f8b7
image encoding technically works but the output can't be checked singe image decoding fails
2025-11-21 20:42:14 +01:00
bluebread
7b8d735c90
mtmd: fixed the wrong scaler for get_rel_pos
2025-11-21 18:04:01 +00:00
bluebread
0f5587dcc0
Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into sf/deepseek-ocr
2025-11-21 17:28:16 +00:00
bluebread
7e9fbeccc5
mtmd: fix get_rel_pos
2025-11-21 17:12:12 +00:00
bluebread
5e6cf3c6a8
Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into sf/deepseek-ocr
2025-11-21 15:36:45 +00:00
bluebread
8bce66d5f2
clip: fixed warnings
2025-11-21 15:28:37 +00:00
Saba Fallah
68b206b65c
sam implementation without using CPU only ops
2025-11-21 15:29:39 +01:00
bluebread
1268dc3fd1
Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into sf/deepseek-ocr
2025-11-20 13:36:07 +00:00
Saba Fallah
88032f46b1
window partitioning using standard ggml ops
2025-11-20 10:07:54 +01:00
Saba Fallah
89afda8da9
visual_model warmup (technically) works
2025-11-18 10:26:32 +01:00
Saba Fallah
63a042f21e
concat image_newline and image_seperator tokens
2025-11-18 09:43:11 +01:00
bluebread
a65ddf5bdd
Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into sf/deepseek-ocr
2025-11-18 06:19:57 +00:00
bluebread
6c0715befc
fix: update callback for ffn_moe_weighted and add callback for attn_out in deepseek2 model
2025-11-18 06:19:38 +00:00
Saba Fallah
331cea8f8e
corrected combining of image encoders' results
2025-11-18 05:59:37 +01:00
Saba Fallah
1e08157134
clip-vit: model convert qkv_proj split
2025-11-17 21:19:51 +01:00
Saba Fallah
8b3d319c03
clip-vit: corrected cls_embd concat
2025-11-17 20:57:51 +01:00
Saba Fallah
cec9a5c6e0
sam erroneous return corrected
2025-11-17 18:59:40 +01:00
Saba Fallah
790bbb97d8
sam warmup working
2025-11-17 15:27:00 +01:00
Saba Fallah
b32bb5e7da
Merge pull request #2 from bluebread/sf/deepseek-ocr
...
mtmd: DeepseekOCR Implement DeepSeek3B-MoE-A570M (LM component)
2025-11-17 11:27:59 +01:00
Saba Fallah
13dc6fb305
Merge branch 'sf/deepseek-ocr' into sf/deepseek-ocr
2025-11-17 11:25:16 +01:00
Saba Fallah
97e0907c5b
loading LM
...
testing Vision model loading
2025-11-17 11:07:33 +01:00
bluebread
e8b2610227
Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into sf/deepseek-ocr
2025-11-17 08:46:27 +00:00
bluebread
2de3436705
mtmd: Fix RoPE type for DeepSeek-OCR LM.
2025-11-17 08:44:29 +00:00
bluebread
76305878d5
mtmd: successfully runs DeepSeek-OCR LM in llama-cli
2025-11-16 08:45:08 +00:00
bluebread
eab28ed318
mtmd: add DeepSeek-OCR LM support with standard attention
2025-11-15 17:28:18 +00:00
Saba Fallah
2aab52e2c4
deepseek-ocr clip-vit model impl
2025-11-15 15:30:07 +01:00
Saba Fallah
578c8d77dc
Merge pull request #1 from bluebread/sf/deepseek-ocr
...
mtmd: fix vision model processing
2025-11-15 11:51:21 +01:00
bluebread
85c7cda8eb
mtmd: fix vision model processing
2025-11-15 04:20:01 +00:00
Saba Fallah
b6b9f02c8a
loading sam tensors
2025-11-14 20:51:48 +01:00
Saba Fallah
43a130b4d0
mtmd: llama.cpp DeepSeekOCR support
...
init commit
2025-11-14 12:40:20 +01:00
Ruben Ortlam
7f3e9d339c
vulkan: iGPU memory reporting fix ( #17110 )
...
* vulkan: use all device-local heaps for memory availability reporting
Co-authored-by: Giuseppe Scrivano <gscrivan@redhat.com>
* use all available heaps for iGPU memory reporting
* Allow multiple memory types per buffer request for devices with split heaps
---------
Co-authored-by: Giuseppe Scrivano <gscrivan@redhat.com>
2025-11-09 09:54:47 +01:00
Ruben Ortlam
8a3519b708
vulkan: fix mmq out of bounds reads ( #17108 )
...
* vulkan: fix mmq out of bounds reads, streamline outdated matmul host code
* fix mul_mat_id quantization call
* Fix compiler warnings
2025-11-09 09:52:57 +01:00
Jeff Bolz
80a6cf6347
vulkan: fuse mul_mat_id + mul ( #17095 )
...
* vulkan: fuse mul_mat_id + mul
This comes up in qwen3 moe.
* split mul_mat_id fusion tests into a separate class
2025-11-09 09:48:42 +01:00
Georgi Gerganov
0750a59903
metal : retain src and dst buffers during async ops ( #17101 )
2025-11-09 08:28:51 +02:00
Xuan-Son Nguyen
aa3b7a90b4
arg: add --cache-list argument to list cached models ( #17073 )
...
* arg: add --cache-list argument to list cached models
* new manifest naming format
* improve naming
* Update common/arg.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-11-08 21:54:14 +01:00
chansikpark
333f2595a3
webui: fix keyboard shortcuts for new chat & edit chat title ( #17007 )
2025-11-08 20:52:35 +01:00
Jeff Bolz
53d7d21e61
vulkan: Use spec constants for conv2d s/d/p and kernel W/H ( #16978 )
...
* vulkan: Use spec constants for conv2d s/d/p and kernel W/H
Also add some additional unroll hints, which seems to help.
* lock around map lookup
2025-11-08 13:24:29 -06:00
Aidan
eeee367de5
server: fix correct time_ms calculation in prompt_progress ( #17093 )
...
* fix: correct time_ms calculation in send_partial_response
The time_ms field was incorrectly calculated. The division was happening
before the subtraction leading to incorrect values.
Before: (ggml_time_us() - slot.t_start_process_prompt / 1000) After:
(ggml_time_us() - slot.t_start_process_prompt) / 1000
* docs : document time_ms field in prompt_progress
2025-11-08 15:12:11 +02:00
Aman Gupta
64fe17fbb8
Revert "CUDA: add expert reduce kernel ( #16857 )" ( #17100 )
2025-11-08 21:05:19 +08:00
Aman Gupta
c1b187688d
CUDA: skip fusion for repeating adds in bias ( #17080 )
2025-11-08 16:58:05 +08:00
SavicStefan
b8a5cfd11a
vulkan: Increase BK to 32; use BK/4 for non-CM mul_mm.comp ( #16636 )
...
Signed-off-by: Stefan Savic <stefan.savic@huawei.com>
Co-authored-by: Stefan Savic <stefan.savic@huawei.com>
2025-11-08 09:28:22 +01:00
Aleksei Nikiforov
08416ebe7f
ggml: disable vxe for cross-compilation by default ( #16966 )
...
Otherwise compilation will fail due to enabling -mvx -mzvector
and not setting corresponding -march options.
2025-11-08 16:00:20 +08:00
Jeff Bolz
b4e335d8dc
vulkan: fuse rms_norm + mul + rope (+ view + set_rows) ( #16977 )
...
This change combines the rms_norm+mul and rope+view+set_rows fusions to
allow fusing the whole sequence together. This comes up in Qwen3, Bailing,
and some other models.
2025-11-08 08:52:15 +01:00
Jeff Bolz
d6fe40fa00
vulkan: Fix test-thread-safety crashes ( #17024 )
...
The std::map pipeline_flash_attn_f32_f16 could be searched and inserted at the
same time, which needs to hold the lock. To be safe, hold the lock for all of
ggml_vk_load_shaders.
2025-11-08 08:39:45 +01:00
Johannes Gäßler
e14e842e87
CUDA: fix MMQ stream-k fixup ne1 indices ( #17089 )
2025-11-08 08:26:18 +01:00
Reese Levine
647b960bd8
ggml webgpu: faster matrix multiplication/matrix-vector multiplication ( #17031 )
...
* Faster tensors (#8 )
Add fast matrix and matrix/vector multiplication.
* Use map for shader replacements instead of pair of strings
2025-11-07 19:27:20 -08:00