Commit Graph

5771 Commits

Author SHA1 Message Date
Georgi Gerganov 6179578988
batch : require non-coupled batch with sequential split_equal
ggml-ci
2025-06-25 17:20:46 +03:00
Georgi Gerganov 5eb1a88dc0
batch : optional requirement for sequential sequence ids
ggml-ci
2025-06-25 17:02:38 +03:00
Georgi Gerganov 6663128448
kv-cache : rework kv_idxs, support seq_cp
ggml-ci
2025-06-25 14:48:47 +03:00
Georgi Gerganov 0bb1da5854
kv-cache : simplify set_rows logic
ggml-ci
2025-06-24 23:24:00 +03:00
Georgi Gerganov 165d822044
graph : support iSWA virtual sequences
ggml-ci
2025-06-24 20:35:16 +03:00
Georgi Gerganov 1b74b9d73b
ggml : extend support for n_seq for soft_max and fattn
ggml-ci
2025-06-24 20:23:56 +03:00
Georgi Gerganov 8c68219835
kv-cache : fix non-FA path with virutal sequences
ggml-ci
2025-06-24 20:01:10 +03:00
Georgi Gerganov 7c6487b22f
metal : extend ggml_soft_max_ext() to support n_seq dim 2025-06-24 20:00:40 +03:00
Georgi Gerganov 401c13e3c3
cont : fix build
ggml-ci
2025-06-24 15:59:47 +03:00
Georgi Gerganov 132143938f
tools : tmp adjustments (TMP)
ggml-ci
2025-06-24 15:21:35 +03:00
Georgi Gerganov 52b9007176
llama : add "virtual sequences"
ggml-ci
2025-06-24 15:02:52 +03:00
Georgi Gerganov 36f8e20d08
kv-cache : utilize ggml_set_rows broadcast
ggml-ci
2025-06-23 13:22:51 +03:00
Georgi Gerganov 332f073589
cont : support non-continuous slots
ggml-ci
2025-06-23 13:22:47 +03:00
Georgi Gerganov 39d0b1e8df
cont : kv-cells cp/set for non-cont slots
ggml-ci
2025-06-23 13:21:37 +03:00
Georgi Gerganov f875d6cb72
cont : migrate to using set of indices instead of slot head
ggml-ci
2025-06-23 13:21:36 +03:00
Georgi Gerganov db2bb378b1
cont : gate the ggml_set_rows usage with env var
ggml-ci
2025-06-23 13:21:36 +03:00
Georgi Gerganov 79dac3c861
kv-cache : use ggml_set_rows
ggml-ci
2025-06-23 13:21:36 +03:00
Radoslav Gerganov 1f647b5992
ggml : fix supports_op 2025-06-23 13:21:36 +03:00
Radoslav Gerganov eba97574da
ggml : simplify forward_dup_f32 2025-06-23 13:21:36 +03:00
Georgi Gerganov c0cfc2f78b
metal : add ggml_set_rows implementation
ggml-ci
2025-06-23 13:21:36 +03:00
Georgi Gerganov 828e5d2fcd
tests : add ggml_set_rows 2025-06-23 13:21:35 +03:00
Georgi Gerganov e73690a69d
ggml : ggml_set_rows update comment + better index name 2025-06-23 13:21:35 +03:00
Georgi Gerganov e89709721b
ggml : support GGML_TYPE_F32 ".from_float" trait 2025-06-23 13:21:35 +03:00
Georgi Gerganov 630c84a2bd
ggml : ggml_set_rows support quantized dst
ggml-ci
2025-06-23 13:21:35 +03:00
Georgi Gerganov df71c803b4
ggml : ggml_set_rows support broadcast 2025-06-23 13:21:35 +03:00
Georgi Gerganov 313a444b22
ggml : add ggml_is_contiguous_rows 2025-06-23 13:21:35 +03:00
Georgi Gerganov 695b6b7025
ggml : add repeat impl for i64 2025-06-23 13:21:34 +03:00
Radoslav Gerganov f2cd962fe2
use I64 for indices 2025-06-23 13:21:34 +03:00
Radoslav Gerganov c1a581a10b
ggml : add ggml_set_rows
Add ggml_set_rows(a, b, c) which copies rows from 'b' into 'a' using
indices from 'c'.

ref: #8366
2025-06-23 13:21:32 +03:00
Georgi Gerganov 7b50d589a8
kv-cells : fix tracking of seq_pos (#14339)
* kv-cells : fix tracking of seq_pos during cache reuse

ggml-ci

* cont : improve error message

ggml-ci

* cont : add more comments
2025-06-23 12:27:35 +03:00
Jeff Bolz 3a9457df96
vulkan: update windows SDK in CI (#14334) 2025-06-23 10:19:24 +02:00
Ed Addario fa4a9f2a1c
quantize : handle user-defined pruning of whole layers (blocks) (#13037) 2025-06-22 23:16:26 +02:00
Sigbjørn Skjæret 238005c2dc
gguf-py : fix SpecialVocab parsing when post_processor is null (#14330) 2025-06-22 19:46:17 +02:00
Ruikai Peng 66aba7aca9
run : avoid double tokenization (#14327)
* run : avoid double tokenization by adopting common_tokenize heuristic

* build : fix windows gcc and clang warnings

* lint : fixed trailing whitepace

* run : fix is_first flag
2025-06-23 01:28:06 +08:00
Georgi Gerganov f1f5e82df6
examples : fix is_first logic for tokenization (#14329)
ggml-ci
2025-06-22 20:10:07 +03:00
uvos af3373f1ad
HIP: enable vec fattn on RDNA4 (#14323) 2025-06-22 16:51:23 +02:00
yuiseki 5d5c066de8
mtmd : fix Pixtral OOM with large images by capping image_size to 1024 (#14326)
Mistral Small 2506 models using Pixtral vision encoder were running out
of GPU memory when processing images larger than 1024x1024 pixels due to
exponential memory growth from unlimited image size.

This fix applies the same 1024x1024 limit used by Qwen2VL models to
prevent OOM issues while maintaining compatibility with existing models.
2025-06-22 14:44:57 +02:00
Sigbjørn Skjæret 40bfa04c95
common : use std::string_view now that we target c++17 (#14319) 2025-06-22 08:37:43 +03:00
Aman Gupta aa064b2eb7
CUDA: add mean operation (#14313)
* CUDA: add mean operation

* add back sum_rows_f32_cuda

* Review: early exit if col!=0
2025-06-22 12:39:54 +08:00
Sigbjørn Skjæret aa0ef5c578
gguf-py : fix Qwen3-Embedding eos token (#14314) 2025-06-21 18:12:05 +02:00
Markus Tavenrath bb16041cae
Add support for VK_EXT_debug_utils to add labels to Vulkan objects. (#13792)
* Add support for VK_EXT_debug_utils to add labels to Vulkan objects. In step 1 compute pipelines are getting labeled.

* remove #ifdef for debug utils and add queue marker.
2025-06-21 08:17:12 +02:00
Sigbjørn Skjæret 58cba76a9a
gguf-py : fix TemplateProcessing pair when bos/eos is missing (#14312) 2025-06-21 07:33:21 +02:00
Georgi Gerganov 67ae5312e2
metal : fix thread-safety (#14300)
ggml-ci
2025-06-21 08:04:18 +03:00
Georgi Gerganov 692e3cdd0a
memory : rename interface to llama_memory_context_i (#14296)
* memory : rename interface to llama_memory_context_i

ggml-ci

* cont : fix comments

* cont : use "mctx" for referencing a memory context

ggml-ci
2025-06-21 08:03:46 +03:00
Daniel Han b23fa0b3f4
convert : fix Llama 4 conversion (#14311) 2025-06-21 06:32:01 +02:00
Georgi Gerganov 06cbedfca1 sync : ggml
ggml-ci
2025-06-20 21:02:47 +03:00
Acly b7147673f2 Add `ggml_roll` (ggml/1274)
* ggml : add ggml_roll

* use set/get_op_params & std::min
2025-06-20 21:02:47 +03:00
David Chiu d860dd99a4
docs : fix the link to llama.h (#14293) 2025-06-20 19:43:35 +02:00
Aman Gupta c959f462a0
CUDA: add conv_2d_transpose (#14287)
* CUDA: add conv_2d_transpose

* remove direct include of cuda_fp16

* Review: add brackets for readability, remove ggml_set_param and add asserts
2025-06-20 22:48:24 +08:00
Sigbjørn Skjæret 22015b2092
lint : remove trailing whitepace (#14304) 2025-06-20 16:37:44 +02:00