bluebread
8bce66d5f2
clip: fixed warnings
2025-11-21 15:28:37 +00:00
bluebread
1268dc3fd1
Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into sf/deepseek-ocr
2025-11-20 13:36:07 +00:00
Saba Fallah
88032f46b1
window partitioning using standard ggml ops
2025-11-20 10:07:54 +01:00
Saba Fallah
89afda8da9
visual_model warmup (technically) works
2025-11-18 10:26:32 +01:00
Saba Fallah
63a042f21e
concat image_newline and image_seperator tokens
2025-11-18 09:43:11 +01:00
bluebread
a65ddf5bdd
Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into sf/deepseek-ocr
2025-11-18 06:19:57 +00:00
bluebread
6c0715befc
fix: update callback for ffn_moe_weighted and add callback for attn_out in deepseek2 model
2025-11-18 06:19:38 +00:00
Saba Fallah
331cea8f8e
corrected combining of image encoders' results
2025-11-18 05:59:37 +01:00
Saba Fallah
1e08157134
clip-vit: model convert qkv_proj split
2025-11-17 21:19:51 +01:00
Saba Fallah
8b3d319c03
clip-vit: corrected cls_embd concat
2025-11-17 20:57:51 +01:00
Saba Fallah
cec9a5c6e0
sam erroneous return corrected
2025-11-17 18:59:40 +01:00
Saba Fallah
790bbb97d8
sam warmup working
2025-11-17 15:27:00 +01:00
Saba Fallah
b32bb5e7da
Merge pull request #2 from bluebread/sf/deepseek-ocr
...
mtmd: DeepseekOCR Implement DeepSeek3B-MoE-A570M (LM component)
2025-11-17 11:27:59 +01:00
Saba Fallah
13dc6fb305
Merge branch 'sf/deepseek-ocr' into sf/deepseek-ocr
2025-11-17 11:25:16 +01:00
Saba Fallah
97e0907c5b
loading LM
...
testing Vision model loading
2025-11-17 11:07:33 +01:00
bluebread
e8b2610227
Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into sf/deepseek-ocr
2025-11-17 08:46:27 +00:00
bluebread
2de3436705
mtmd: Fix RoPE type for DeepSeek-OCR LM.
2025-11-17 08:44:29 +00:00
bluebread
76305878d5
mtmd: successfully runs DeepSeek-OCR LM in llama-cli
2025-11-16 08:45:08 +00:00
bluebread
eab28ed318
mtmd: add DeepSeek-OCR LM support with standard attention
2025-11-15 17:28:18 +00:00
Saba Fallah
2aab52e2c4
deepseek-ocr clip-vit model impl
2025-11-15 15:30:07 +01:00
Saba Fallah
578c8d77dc
Merge pull request #1 from bluebread/sf/deepseek-ocr
...
mtmd: fix vision model processing
2025-11-15 11:51:21 +01:00
bluebread
85c7cda8eb
mtmd: fix vision model processing
2025-11-15 04:20:01 +00:00
Saba Fallah
b6b9f02c8a
loading sam tensors
2025-11-14 20:51:48 +01:00
Saba Fallah
43a130b4d0
mtmd: llama.cpp DeepSeekOCR support
...
init commit
2025-11-14 12:40:20 +01:00
Ruben Ortlam
7f3e9d339c
vulkan: iGPU memory reporting fix ( #17110 )
...
* vulkan: use all device-local heaps for memory availability reporting
Co-authored-by: Giuseppe Scrivano <gscrivan@redhat.com>
* use all available heaps for iGPU memory reporting
* Allow multiple memory types per buffer request for devices with split heaps
---------
Co-authored-by: Giuseppe Scrivano <gscrivan@redhat.com>
2025-11-09 09:54:47 +01:00
Ruben Ortlam
8a3519b708
vulkan: fix mmq out of bounds reads ( #17108 )
...
* vulkan: fix mmq out of bounds reads, streamline outdated matmul host code
* fix mul_mat_id quantization call
* Fix compiler warnings
2025-11-09 09:52:57 +01:00
Jeff Bolz
80a6cf6347
vulkan: fuse mul_mat_id + mul ( #17095 )
...
* vulkan: fuse mul_mat_id + mul
This comes up in qwen3 moe.
* split mul_mat_id fusion tests into a separate class
2025-11-09 09:48:42 +01:00
Georgi Gerganov
0750a59903
metal : retain src and dst buffers during async ops ( #17101 )
2025-11-09 08:28:51 +02:00
Xuan-Son Nguyen
aa3b7a90b4
arg: add --cache-list argument to list cached models ( #17073 )
...
* arg: add --cache-list argument to list cached models
* new manifest naming format
* improve naming
* Update common/arg.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-11-08 21:54:14 +01:00
chansikpark
333f2595a3
webui: fix keyboard shortcuts for new chat & edit chat title ( #17007 )
2025-11-08 20:52:35 +01:00
Jeff Bolz
53d7d21e61
vulkan: Use spec constants for conv2d s/d/p and kernel W/H ( #16978 )
...
* vulkan: Use spec constants for conv2d s/d/p and kernel W/H
Also add some additional unroll hints, which seems to help.
* lock around map lookup
2025-11-08 13:24:29 -06:00
Aidan
eeee367de5
server: fix correct time_ms calculation in prompt_progress ( #17093 )
...
* fix: correct time_ms calculation in send_partial_response
The time_ms field was incorrectly calculated. The division was happening
before the subtraction leading to incorrect values.
Before: (ggml_time_us() - slot.t_start_process_prompt / 1000) After:
(ggml_time_us() - slot.t_start_process_prompt) / 1000
* docs : document time_ms field in prompt_progress
2025-11-08 15:12:11 +02:00
Aman Gupta
64fe17fbb8
Revert "CUDA: add expert reduce kernel ( #16857 )" ( #17100 )
2025-11-08 21:05:19 +08:00
Aman Gupta
c1b187688d
CUDA: skip fusion for repeating adds in bias ( #17080 )
2025-11-08 16:58:05 +08:00
SavicStefan
b8a5cfd11a
vulkan: Increase BK to 32; use BK/4 for non-CM mul_mm.comp ( #16636 )
...
Signed-off-by: Stefan Savic <stefan.savic@huawei.com>
Co-authored-by: Stefan Savic <stefan.savic@huawei.com>
2025-11-08 09:28:22 +01:00
Aleksei Nikiforov
08416ebe7f
ggml: disable vxe for cross-compilation by default ( #16966 )
...
Otherwise compilation will fail due to enabling -mvx -mzvector
and not setting corresponding -march options.
2025-11-08 16:00:20 +08:00
Jeff Bolz
b4e335d8dc
vulkan: fuse rms_norm + mul + rope (+ view + set_rows) ( #16977 )
...
This change combines the rms_norm+mul and rope+view+set_rows fusions to
allow fusing the whole sequence together. This comes up in Qwen3, Bailing,
and some other models.
2025-11-08 08:52:15 +01:00
Jeff Bolz
d6fe40fa00
vulkan: Fix test-thread-safety crashes ( #17024 )
...
The std::map pipeline_flash_attn_f32_f16 could be searched and inserted at the
same time, which needs to hold the lock. To be safe, hold the lock for all of
ggml_vk_load_shaders.
2025-11-08 08:39:45 +01:00
Johannes Gäßler
e14e842e87
CUDA: fix MMQ stream-k fixup ne1 indices ( #17089 )
2025-11-08 08:26:18 +01:00
Reese Levine
647b960bd8
ggml webgpu: faster matrix multiplication/matrix-vector multiplication ( #17031 )
...
* Faster tensors (#8 )
Add fast matrix and matrix/vector multiplication.
* Use map for shader replacements instead of pair of strings
2025-11-07 19:27:20 -08:00
bssrdf
299f5d782c
CUDA: properly handle nb00=nb02 case for cpy ( #17081 )
2025-11-07 23:41:58 +01:00
Acly
ac76d36201
vulkan : refactor buffer handling in vk_op_f32 ( #16840 )
...
* vulkan : refactor/simplify buffer handling in vk_op_* functions
* Combine UMA handling into ggml_vk_tensor_subbuffer
2025-11-07 21:08:50 +01:00
Johannes Gäßler
6515610506
CUDA: fix should_use_mmvf for ne11 == 1 ( #17085 )
...
* CUDA: fix should_use_mmvf for ne11 == 1
* Apply suggestion from @am17an
Co-authored-by: Aman Gupta <amangupta052@gmail.com>
---------
Co-authored-by: Aman Gupta <amangupta052@gmail.com>
2025-11-07 20:53:14 +01:00
Georgi Gerganov
7956bb4d7f
bench : cache the llama_context state at computed depth ( #16944 )
...
* bench : cache llama_context state at depth
* cont : handle failures to restore the old state
* cont : print information when the state is being reused
2025-11-07 21:23:11 +02:00
Sigbjørn Skjæret
9008027aa3
hparams : add n_embd_inp() to support extended embed ( #16928 )
...
* add n_embd_full to support extended embed
* don't change output
* rename to n_embd_inp
* restore n_embd where applicable
2025-11-07 19:27:58 +01:00
Georgi Gerganov
16bcc1259d
kv-cache : pad the cache size to 256 for performance ( #17046 )
...
* kv-cache : pad the size of the small SWA cache for performance
* context : pad the total context to 256
* cont : future-proof the swa pad
* server : adjust test params to new logic
2025-11-07 20:03:25 +02:00
Adrien Gallouët
9eb9a1331d
Revert "ggml-cpu: detect correct cpu flags for arm64 ( #16229 ) ( #16239 )" ( #17084 )
...
This reverts commit 7c23f3f0d4 .
2025-11-07 18:34:05 +02:00
iron
7c23f3f0d4
ggml-cpu: detect correct cpu flags for arm64 ( #16229 ) ( #16239 )
...
When using GCC 9 and GCC 12 on the arm64 platform of ubuntu 2004,
the command "gcc -mcpu=native -E -v -" fails to detect the correct CPU flags,
which results in compilation failures for certain extended instructions,
but the correct CPU flags can be obtained by using gcc -march.
Signed-off-by: lizhenneng <lizhenneng@kylinos.cn>
Co-authored-by: lizhenneng <lizhenneng@kylinos.cn>
2025-11-07 08:18:14 -08:00
Georgi Gerganov
8c0d6bb455
server : print the samplers chain for each request ( #17070 )
2025-11-07 12:24:47 +02:00
Xuan-Son Nguyen
5c9a18e674
common: move download functions to download.(cpp|h) ( #17059 )
...
* common: move download functions to download.(cpp|h)
* rm unused includes
* minor cleanup
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-11-07 11:23:34 +01:00