Commit Graph

671 Commits

Author SHA1 Message Date
Piotr Wilkin 562fe10dd5 Add proper reasoning detection for cli 2026-02-16 22:39:12 +01:00
Piotr Wilkin 4f61ee302d Windows compilation issues 2026-02-16 22:39:12 +01:00
Piotr Wilkin bcd72bc8ac Add "marker" PEG parser + sample in analyzer 2026-02-16 22:39:12 +01:00
Piotr Wilkin ba8f7dcdcb Basic universal PEG parser wrapper with tag-to-dict based extractor 2026-02-16 22:39:12 +01:00
Piotr Wilkin 3d27add7b1 Refactor into class-based approach 2026-02-16 22:39:12 +01:00
Piotr Wilkin 9c8cf1bc43 Clean algorithm for calculate_diff_split; fix buggy expectations 2026-02-16 22:39:12 +01:00
Piotr Wilkin b0ed986aec -> Refactor autoparser analyzer structure
-> Fix content truncation
-> Fix errors in capability detection due to non-empty assistant message
-> Add missing debug prints for Jinja
2026-02-16 22:39:12 +01:00
Piotr Wilkin 964972d64e Fix windows build 2026-02-16 22:39:12 +01:00
Piotr Wilkin 5164f2f3c8 Fix case with object inside object, refactor long methods. 2026-02-16 22:39:12 +01:00
Piotr Wilkin 8397fdddc6 Fix number partial parsing issue 2026-02-16 22:39:12 +01:00
Piotr Wilkin 5df5390c72 More edge cases 2026-02-16 22:39:12 +01:00
Piotr Wilkin 971b216ce1 Fix pesky issue on optional trailing arguments in function calls for TAGGED format 2026-02-16 22:39:11 +01:00
Piotr Wilkin b223a7b1aa We don't like segfaults (or failing tests). 2026-02-16 22:39:11 +01:00
Piotr Wilkin 0abe32a3d8 Fix incorrect coercion of strings to non-string types during parsing 2026-02-16 22:39:11 +01:00
Piotr Wilkin f1937febff Feeding the hungry editor checker god. 2026-02-16 22:39:11 +01:00
Piotr Wilkin 5cabb3c737 Reverd bad change fix some templates and most tests 2026-02-16 22:39:11 +01:00
Piotr Wilkin 2eedbb24e0 Quick vibe-coded fix for proper object printing 2026-02-16 22:39:11 +01:00
Piotr Wilkin 1e3d93cb6b ANOTHER GIANT POST-FIXUP SQUISH 2026-02-16 22:39:11 +01:00
Piotr Wilkin 52d31fa024 THE GIANT AUTOPARSER SQUISH 2026-02-16 22:39:11 +01:00
Georgi Gerganov 08e6d914b8
ggml : avoid UB in gemm ukernel (#19642) 2026-02-15 14:56:35 +02:00
Jeff Bolz dbb023336b
vulkan: support L2_NORM with contiguous rows (#19604) 2026-02-14 06:42:04 +01:00
ymcki 0e21991472
fix vulkan ggml_acc only works in 3d but not 4d (#19426)
* fix vulkan ggml_acc only works in 3d but not 4d

* removed clamp in test_acc_block

* use the correct stride and its test case

* cuda : fix "supports op" condition

* change src0 to src1 in ggml_vk_acc. Update acc.comp with jeffbolznv\'s suggestion except to keep the boundary check

* version without boundary check

* revert back to boundary check version

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-02-13 13:31:37 +01:00
Georgi Gerganov 490eb96b88
metal : support GGML_OP_SET (#19548) 2026-02-13 07:34:52 +02:00
Georgi Gerganov 3b3a948134
metal : update sum_rows kernel to support float4 (#19524) 2026-02-12 11:35:28 +02:00
Georgi Gerganov 914dde72ba
ggml : unary ops support non-cont src0 + metal F16 unary ops (#19511)
* ggml : unary ops support non-cont src0

* metal : support F16 unary ops + fix ELU
2026-02-11 18:58:43 +02:00
Georgi Gerganov 89181c0b6d
ggml : extend bin bcast for permuted src1 (#19484)
* tests : extend bin bcast for permuted src1

* cont : extend bin support

* cont : s0 is always 1

* tests : simplify
2026-02-11 07:52:00 +02:00
Georgi Gerganov ceaa89b786
metal : consolidate unary ops (#19490) 2026-02-11 07:51:12 +02:00
Xuan-Son Nguyen 9a96352729
test: fix IMROPE perf test case (#19465) 2026-02-10 14:37:50 +01:00
Georgi Gerganov a0d585537c
cuda : extend GGML_OP_PAD to work with non-cont src0 (#19429)
* cuda : extend GGML_OP_PAD to work with non-cont src0

* tests : add permuted pad
2026-02-10 08:07:16 +02:00
Hugo 1e8924fd65
cmake : add variable to skip installing tests (#19370)
When packaging downstream, there's usually little point in installing
test. The default behaviour remains the same.
2026-02-09 07:12:02 +01:00
Jeff Bolz db6adb3c88
tests: reduce number of FA test permutations (#19381)
Only test non-F16 for head size 64 and 72 (one a multiple of QK, one not).
2026-02-06 08:50:30 -06:00
Jeff Bolz 449ec2ab07
vulkan: Preprocess FA mask to detect all-neg-inf and all-zero. (#19281)
Write out a 2-bit code per block and avoid loading the mask when it
matches these two common cases.

Apply this optimization when the mask is relatively large (i.e. prompt
processing).
2026-02-05 09:26:38 -06:00
Georgi Gerganov eaba92c3dc
tests : add non-cont, inplace rope tests (#19296)
* tests : add non-cont, inplace rope tests

* cont : exercise dim 3

Co-authored-by: Jeff Bolz <jbolz@nvidia.com>

* cont : more dim3 exercises

---------

Co-authored-by: Jeff Bolz <jbolz@nvidia.com>
2026-02-04 12:45:21 +02:00
Sid Mohan 0dfcd3b607
jinja : add missing 'in' test to template engine (#19004) (#19239)
* jinja : add missing 'in' test to template engine (#19004)

The jinja template parser was missing the 'in' test from
global_builtins(), causing templates using reject("in", ...),
select("in", ...), or 'x is in(y)' to fail with
"selectattr: unknown test 'in'".

This broke tool-calling for Qwen3-Coder and any other model
whose chat template uses the 'in' test.

Added test_is_in supporting array, string, and object containment
checks, mirroring the existing 'in' operator logic in runtime.cpp.

Includes test cases for all three containment types plus
reject/select filter usage.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* reuse test_is_in in binary op

---------

Co-authored-by: Sid Mohan <sidmohan0@users.noreply.github.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2026-02-02 21:00:55 +01:00
Aman Gupta 9f682fb640
ggml-cpu: FA split across kv for faster TG (#19209)
* ggml-cpu: split across kv for faster TG

* simplify sinks application

* add ref impl
2026-02-03 01:19:55 +08:00
Christian Kastner 7a4ca3cbd9
docs : Minor cleanups (#19252)
* Update old URLs to github.com/ggml-org/

* Bump copyrights
2026-02-02 08:38:55 +02:00
Georgi Gerganov c3b87cebff
tests : add GQA=20 FA test (#19095) 2026-01-30 13:52:57 +02:00
Aldehir Rojas 7b7ae857f6
chat : add parsing for solar-open-100b (#18540)
* chat : add parsing for solar-open-100b

* add comments to rules

* cont : make assistant start optional

* cont : remove assistant start prefix altogether

---------

Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>
2026-01-29 16:06:15 +01:00
Sigbjørn Skjæret b45ef2702c
jinja : do not pass empty tools and add some none filters (#19176) 2026-01-29 14:06:54 +01:00
Sigbjørn Skjæret 60368e1d73
jinja : undefined should be treated as sequence/iterable (return string/array) by filters/tests (#19147)
* undefined is treated as iterable (string/array) by filters

`tojson` is not a supported `undefined` filter

* add tests

* add sequence and iterable tests

keep it DRY and fix some types
2026-01-28 14:40:29 +01:00
Sigbjørn Skjæret 2b4cbd2834
jinja : implement mixed type object keys (#18955)
* implement mixed type object keys

* add tests

* refactor

* minor fixes

* massive refactor

* add more tests

* forgotten tuples

* fix array/object is_hashable

* correct (albeit broken) jinja responses

verified with transformers

* improved hashing and equality

* refactor hash function

* more exhausive test case

* clean up

* cont

* cont (2)

* missing cstring

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2026-01-27 19:50:42 +01:00
Johannes Gäßler b0311c16d2
CUDA: fix padding of GQA to power of 2 in FA (#19115) 2026-01-26 23:24:58 +01:00
Johannes Gäßler 4e5b83b226
GGUF: check that tensor size is representable (#19072) 2026-01-24 21:57:51 +01:00
Xuan-Son Nguyen 51fa458a92
server : support preserving reasoning_content in assistant message (#18994)
* support reasoning_content input

* report template caps to webui

* add docs

* rm commented code
2026-01-22 21:30:06 +01:00
Georgi Gerganov a5eaa1d6a3
mla : make the V tensor a view of K (#18986)
* mla : pass V as a view of K to the FA op

* cuda : adjust mla logic to new layout

* kv-cache : fix rope shift

* tests : remove comment

* cuda : fix reusable_cutoff

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2026-01-22 22:09:01 +02:00
Piotr Wilkin (ilintar) c301172f66
jinja: support none|string (#18995)
* jinja: support none|string

* Update common/jinja/value.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Update tests/test-jinja.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Add as_string()

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-01-21 19:24:37 +01:00
Jeff Bolz 33f890e579
vulkan: support flash attention GQA/split_k with small batches (#18938) 2026-01-21 17:43:43 +01:00
Xuan-Son Nguyen 2c1f199653
cli : fix reasoning responses in CLI (#18961)
* cli : fix reasoning responses in CLI

* fix build

* fix build (2)
2026-01-20 18:23:25 +01:00
Sigbjørn Skjæret 959ecf7f23
jinja : fix undefined keys and attributes and int/float as bool (#18924)
* fix undefined keys and attributes

* add falsy tests

* as_bool for integers and floats

* more falsy/truthy tests

* --typo
2026-01-19 20:29:43 +01:00
Sigbjørn Skjæret 4037093c66
ci : run test-jinja -py on high perf [no ci] (#18916) 2026-01-19 20:29:15 +01:00