Commit Graph

7304 Commits

Author SHA1 Message Date
Adrien Gallouët cb44fc84e8
cmake : fix ARM feature verification (#17170)
* cmake : fix ARM feature verification

Use check_cxx_source_compiles to prevent conflicts with
the existing GGML_NATIVE detection code.

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* cmake : unset __ARM_FEATURE when feature is disabled

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* cmake : fix scope, this is really a macro

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* arm_neon.h is useless

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

---------

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-11-17 21:37:29 +01:00
Saba Fallah 1e08157134 clip-vit: model convert qkv_proj split 2025-11-17 21:19:51 +01:00
Saba Fallah 8b3d319c03 clip-vit: corrected cls_embd concat 2025-11-17 20:57:51 +01:00
Saba Fallah cec9a5c6e0 sam erroneous return corrected 2025-11-17 18:59:40 +01:00
Saba Fallah 790bbb97d8 sam warmup working 2025-11-17 15:27:00 +01:00
Adrien Gallouët cb623de3fc
ggml : add missing AVX512 feature checks (#17270)
_mm512_cvtepu8_epi16        requires  __AVX512BW__
_mm512_srli_epi16           requires  __AVX512BW__
__builtin_ia32_inserti32x8  requires  __AVX512DQ__

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-11-17 12:12:00 +01:00
Saba Fallah b32bb5e7da
Merge pull request #2 from bluebread/sf/deepseek-ocr
mtmd: DeepseekOCR Implement DeepSeek3B-MoE-A570M (LM component)
2025-11-17 11:27:59 +01:00
Saba Fallah 13dc6fb305
Merge branch 'sf/deepseek-ocr' into sf/deepseek-ocr 2025-11-17 11:25:16 +01:00
Saba Fallah 97e0907c5b loading LM
testing Vision model loading
2025-11-17 11:07:33 +01:00
Georgi Gerganov 7aaeedc098
metal : support I32 -> I32 copy (#17317) 2025-11-17 11:52:00 +02:00
Georgi Gerganov 3347e6d904
metal : faster argsort (#17315)
* metal : faster argsort

* cont : keep data in registers
2025-11-17 11:51:48 +02:00
Georgi Gerganov 1a139644a8
metal : add cumsum (#17305) 2025-11-17 11:51:13 +02:00
bluebread e8b2610227 Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into sf/deepseek-ocr 2025-11-17 08:46:27 +00:00
bluebread 2de3436705 mtmd: Fix RoPE type for DeepSeek-OCR LM. 2025-11-17 08:44:29 +00:00
hipudding 2376b7758c
CANN: Use smart pointers to manage ACL objects (#17238)
* CANN: Use smart pointers to manage ACL objects

Previously, ACL objects were managed via manual destruction, which
led to multiple memory-leak issues during runtime. This patch replaces
manual memory management with smart pointers so that ACL objects
are properly released and ownership is clearly defined.

Note that the ownership of an ACL object belongs to the function
that creates it. Other internal functions should operate on these ACL
objects using raw pointers to avoid unintended ownership transfers.

Additionally, since aclTensorList automatically frees its contained
aclTensor objects, any aclTensor added to a tensor list must release
ownership to avoid double free operations.

This PR also removes the asynchronous task submission mechanism.
Due to changes in recent CANN versions, tiling time has significantly
decreased. Even with a dual-thread submission model, the dispatch
overhead still falls on the critical path, making async submission
less beneficial. Moreover, aclGraph support provides a much better
path to reducing operator dispatch latency.

* CANN: resolve review comments
2025-11-17 08:43:59 +08:00
Pavels Zaicenkovs dbed61294a
vulkan: add LOG operation support for F32 and F16 (#17183)
* vulkan: add LOG operation support for F32 and F16

Part of #14909.

* vulkan: Fix LOG operation types

* docs: Update operation support documentation for Vulkan LOG operation

* vulkan: fix log_f16 shader

* docs: restore missing LOG test cases and regenerate ops.md
2025-11-16 22:50:09 +01:00
Ruben Ortlam 80deff3648
vulkan: fix MMQ quantize_y condition (#17301) 2025-11-16 19:38:17 +01:00
Eve 8b1c339bd2
ci : revert #16249 (#17303)
* Delete .github/workflows/build-amd.yml

* Update build.yml
2025-11-16 19:09:17 +01:00
bluebread 76305878d5 mtmd: successfully runs DeepSeek-OCR LM in llama-cli 2025-11-16 08:45:08 +00:00
Georgi Gerganov 416e7c7f47
metal : remove obosolete asserts (#17295) 2025-11-16 09:50:26 +02:00
Georgi Gerganov 5b2093becc
server : handle context overflow during decode (#17267)
* server : handle context overflow during decode

* server : minor refactor
2025-11-16 09:23:37 +02:00
lhez 52e5d421f1
opencl: fix rms_norm_mul (#17250)
* opencl: use subgrroup reduce for reduction in rms_norm_mul

* opencl: add comment about workgroup size
2025-11-15 17:40:14 -08:00
shaofeiqi 4db5641210
opencl: add kernel to handle mat mul in attention to improve encoding speed (#17181)
* Add mul_mm_f16_f32_kq_kqv kernel

* Add ggml_cl_mul_mat_kq_kqv_adreno func

* fix whitespace

* remove unused variable

* remove redundant

* refactor and clean up

* remove trailing whitespace
2025-11-15 17:33:10 -08:00
shani-f 72bd7321a7
sycl : unify unary kernels with a generic implementation and enable wide operator support (#17213)
* SYCL: add generic unary op implementation for multiple ops (ABS/SGN/…); unify non-contiguous access

* SYCL: update documentation and sycl.csv to reflect new unary op support

* update ops.md after syncing SYCL.csv changes

* Fix SYCL.csv merge conflict

* Update ops.md after fixing SYCL.csv conflicts

* Fix SYCL.csv tail after merge conflict and regenerate ops.md

* Fix line endings and final newline in SYCL.csv

* Remove TOPK_MOE entries from SYCL.csv as requested

* Update ops.md after removing TOPK_MOE from SYCL.csv

* Regenerated SYCL.csv and synced ops.md with upstream

* Update ops.md using create_ops_docs.py
2025-11-16 00:52:42 +01:00
Aleksander Grygier 22e1ce2f81
webui: Fix clickability around chat processing statistics UI (#17278)
* fix: Better pointer events handling in chat processing info elements

* chore: update webui build output
2025-11-15 22:41:41 +01:00
Pascal 1411d9275a
webui: add OAI-Compat Harmony tool-call streaming visualization and persistence in chat UI (#16618)
* webui: add OAI-Compat Harmony tool-call live streaming visualization and persistence in chat UI

- Purely visual and diagnostic change, no effect on model context, prompt
  construction, or inference behavior

- Captured assistant tool call payloads during streaming and non-streaming
  completions, and persisted them in chat state and storage for downstream use

- Exposed parsed tool call labels beneath the assistant's model info line
  with graceful fallback when parsing fails

- Added tool call badges beneath assistant responses that expose JSON tooltips
  and copy their payloads when clicked, matching the existing model badge styling

- Added a user-facing setting to toggle tool call visibility to the Developer
  settings section directly under the model selector option

* webui: remove scroll listener causing unnecessary layout updates (model selector)

* Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* chore: npm run format & update webui build output

* chore: update webui build output

---------

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
2025-11-15 21:09:32 +01:00
Sigbjørn Skjæret 662192e1dc
convert : remove unnecessary chat template patching (#17289) 2025-11-15 20:58:59 +01:00
Jeff Bolz 24dc769f1b
vulkan: Fuse mul_mat_id+add_id+mul and mul_mat+add+add. (#17287)
These both show up in gpt-oss. Also, cleanup the mul_mat_vec fusion code a bit.
2025-11-15 19:54:23 +01:00
bluebread eab28ed318 mtmd: add DeepSeek-OCR LM support with standard attention 2025-11-15 17:28:18 +00:00
Saba Fallah 2aab52e2c4 deepseek-ocr clip-vit model impl 2025-11-15 15:30:07 +01:00
Ruben Ortlam 4dca015b7e
vulkan: Replace 16-bit unpack8 calls to work around legacy Windows AMD driver bug (#17285) 2025-11-15 15:18:58 +01:00
Sigbjørn Skjæret 9a8860cf5d
convert : use all parts in safetensors index (#17286) 2025-11-15 14:12:39 +01:00
Sigbjørn Skjæret 9d3ef4809f
convert : set expert gating func in base class (#17279) 2025-11-15 14:06:24 +01:00
Ankur Verma c7b7db0445
mtmd-cli: Avoid logging to stdout for model loading messages in mtmd-cli (#17277) 2025-11-15 12:41:16 +01:00
Giuseppe Scrivano 1568d13c2c
vulkan: implement ABS and NEG (#17245)
* docs: update Vulkan ops

* vulkan: add NEG op

* vulkan: add ABS op

---------

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2025-11-15 12:00:29 +01:00
Jeff Bolz 439342ea0b
vulkan: Use ggml_vk_tensor_subbuffer in mul_mat_vec(id) paths (#17244)
* vulkan: Use ggml_vk_tensor_subbuffer in mul_mat_vec(id) paths

* set allow_misalign
2025-11-15 11:56:15 +01:00
Saba Fallah 578c8d77dc
Merge pull request #1 from bluebread/sf/deepseek-ocr
mtmd: fix vision model processing
2025-11-15 11:51:21 +01:00
Jeff Bolz 234ae7d7bd
vulkan: skip all-negative-inf blocks in FA (#17186) 2025-11-15 10:37:25 +01:00
Jeff Bolz 38eaf32af1
vulkan: change graph_compute to be async and enable get_tensor_async (#17158)
* vulkan: change graph_compute to be async and enable get_tensor_async

This allows some additional CPU/GPU overlap for large pp workloads. Also seems
to help a bit for token gen, maybe getting rid of a small bubble between
graph_compute and get_tensor.

Async set and copy functions seem to be very rarely used, so I didn't enable
them because I didn't have a good way to test them.

The async commands need to be ordered against each other, so put them all on
the compute queue. The non-async commands still use the transfer queue.

The fence for graph_compute/get_tensor_async is submitted and waited on in
ggml_vk_synchronize.

* fix thread safety errors

* teardown context cleanly

* Handle async read to non-pinned dst
2025-11-15 09:06:41 +01:00
bluebread 85c7cda8eb mtmd: fix vision model processing 2025-11-15 04:20:01 +00:00
Saba Fallah b6b9f02c8a loading sam tensors 2025-11-14 20:51:48 +01:00
Xuan-Son Nguyen 9b17d74ab7
mtmd: add mtmd_log_set (#17268) 2025-11-14 15:56:19 +01:00
Bartowski e1fcf8b09b
model : add AfmoeForCausalLM support (#16477)
* Add AFMOE model support

* Update to vocab

* Add model sizing

* Undo Rope change for ARCEE model

* Address review comments

* Update modeling code is_sliding -> use_rope, replace hard-coded logic

* Fix AFMOE tokenizer

* Update convert_hf_to_gguf.py

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Update convert_hf_to_gguf.py

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Update AFMoE tokenizer class identification to be more unique

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-11-14 13:54:10 +01:00
Marek Hradil jr. 6cd0cf72ce
fix : Dangling pointer for non-empty trigger words in lazy grammar construction (#17048)
* fix : Dangling pointer for non-empty trigger words in llama_sampler_init_grammar_impl (#17047)

* Replace 'static' workaround, with keeping variable in scope for longer

* Create std::array directly and pass into llama_grammar_init_impl

* Add back the trigger pattern

* Missed array include
2025-11-14 14:35:26 +02:00
Georgi Gerganov d396b43748
server : fix "can batch with" bug (#17263) 2025-11-14 14:03:45 +02:00
Saba Fallah 43a130b4d0 mtmd: llama.cpp DeepSeekOCR support
init commit
2025-11-14 12:40:20 +01:00
Georgi Gerganov 45c6ef7307
metal : support argsort for ne00 > 1024 (#17247)
* metal : refactor argsort

* cont : sort chunks

* cont : merge sorted buckets

* cont : cleanup
2025-11-14 09:36:06 +02:00
Georgi Gerganov 2606b0adab
metal : make the FA extra sizes consistent (#17143) 2025-11-14 09:13:34 +02:00
ixgbe 307772fcda
readme : add RVV,ZVFH,ZFH,ZICBOP support for RISC-V (#17259)
Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>
2025-11-14 09:12:56 +02:00
Aleksander Grygier f1bad23f88
Better UX for handling multiple attachments in WebUI (#17246) 2025-11-14 01:19:08 +01:00