Piotr Wilkin
3605e78569
Refactor into class-based approach
2026-02-14 00:17:43 +01:00
Piotr Wilkin
24cc1bcd6d
Clean algorithm for calculate_diff_split; fix buggy expectations
2026-02-13 03:17:20 +01:00
Piotr Wilkin
28fcef67c0
-> Refactor autoparser analyzer structure
...
-> Fix content truncation
-> Fix errors in capability detection due to non-empty assistant message
-> Add missing debug prints for Jinja
2026-02-13 00:55:35 +01:00
Piotr Wilkin
29ce31b1a3
Fix windows build
2026-02-11 13:47:30 +01:00
Piotr Wilkin
bd549b3b37
Fix case with object inside object, refactor long methods.
2026-02-11 13:47:29 +01:00
Piotr Wilkin
2081e9b056
Fix number partial parsing issue
2026-02-11 13:47:29 +01:00
Piotr Wilkin
b260de1d86
More edge cases
2026-02-11 13:47:29 +01:00
Piotr Wilkin
60717b3e5a
Fix pesky issue on optional trailing arguments in function calls for TAGGED format
2026-02-11 13:47:29 +01:00
Piotr Wilkin
15f7aa1fbe
We don't like segfaults (or failing tests).
2026-02-11 13:47:29 +01:00
Piotr Wilkin
09b447a487
Fix incorrect coercion of strings to non-string types during parsing
2026-02-11 13:47:29 +01:00
Piotr Wilkin
a01e15280a
Feeding the hungry editor checker god.
2026-02-11 13:47:29 +01:00
Piotr Wilkin
3770566c45
Reverd bad change fix some templates and most tests
2026-02-11 13:47:29 +01:00
Piotr Wilkin
b0853baca7
Quick vibe-coded fix for proper object printing
2026-02-11 13:47:29 +01:00
Piotr Wilkin
1662fa5bea
ANOTHER GIANT POST-FIXUP SQUISH
2026-02-11 13:47:29 +01:00
Piotr Wilkin
7e6f75a414
THE GIANT AUTOPARSER SQUISH
2026-02-11 13:47:29 +01:00
Georgi Gerganov
89181c0b6d
ggml : extend bin bcast for permuted src1 ( #19484 )
...
* tests : extend bin bcast for permuted src1
* cont : extend bin support
* cont : s0 is always 1
* tests : simplify
2026-02-11 07:52:00 +02:00
Georgi Gerganov
ceaa89b786
metal : consolidate unary ops ( #19490 )
2026-02-11 07:51:12 +02:00
Xuan-Son Nguyen
9a96352729
test: fix IMROPE perf test case ( #19465 )
2026-02-10 14:37:50 +01:00
Georgi Gerganov
a0d585537c
cuda : extend GGML_OP_PAD to work with non-cont src0 ( #19429 )
...
* cuda : extend GGML_OP_PAD to work with non-cont src0
* tests : add permuted pad
2026-02-10 08:07:16 +02:00
Hugo
1e8924fd65
cmake : add variable to skip installing tests ( #19370 )
...
When packaging downstream, there's usually little point in installing
test. The default behaviour remains the same.
2026-02-09 07:12:02 +01:00
Jeff Bolz
db6adb3c88
tests: reduce number of FA test permutations ( #19381 )
...
Only test non-F16 for head size 64 and 72 (one a multiple of QK, one not).
2026-02-06 08:50:30 -06:00
Jeff Bolz
449ec2ab07
vulkan: Preprocess FA mask to detect all-neg-inf and all-zero. ( #19281 )
...
Write out a 2-bit code per block and avoid loading the mask when it
matches these two common cases.
Apply this optimization when the mask is relatively large (i.e. prompt
processing).
2026-02-05 09:26:38 -06:00
Georgi Gerganov
eaba92c3dc
tests : add non-cont, inplace rope tests ( #19296 )
...
* tests : add non-cont, inplace rope tests
* cont : exercise dim 3
Co-authored-by: Jeff Bolz <jbolz@nvidia.com>
* cont : more dim3 exercises
---------
Co-authored-by: Jeff Bolz <jbolz@nvidia.com>
2026-02-04 12:45:21 +02:00
Sid Mohan
0dfcd3b607
jinja : add missing 'in' test to template engine ( #19004 ) ( #19239 )
...
* jinja : add missing 'in' test to template engine (#19004 )
The jinja template parser was missing the 'in' test from
global_builtins(), causing templates using reject("in", ...),
select("in", ...), or 'x is in(y)' to fail with
"selectattr: unknown test 'in'".
This broke tool-calling for Qwen3-Coder and any other model
whose chat template uses the 'in' test.
Added test_is_in supporting array, string, and object containment
checks, mirroring the existing 'in' operator logic in runtime.cpp.
Includes test cases for all three containment types plus
reject/select filter usage.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* reuse test_is_in in binary op
---------
Co-authored-by: Sid Mohan <sidmohan0@users.noreply.github.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2026-02-02 21:00:55 +01:00
Aman Gupta
9f682fb640
ggml-cpu: FA split across kv for faster TG ( #19209 )
...
* ggml-cpu: split across kv for faster TG
* simplify sinks application
* add ref impl
2026-02-03 01:19:55 +08:00
Christian Kastner
7a4ca3cbd9
docs : Minor cleanups ( #19252 )
...
* Update old URLs to github.com/ggml-org/
* Bump copyrights
2026-02-02 08:38:55 +02:00
Georgi Gerganov
c3b87cebff
tests : add GQA=20 FA test ( #19095 )
2026-01-30 13:52:57 +02:00
Aldehir Rojas
7b7ae857f6
chat : add parsing for solar-open-100b ( #18540 )
...
* chat : add parsing for solar-open-100b
* add comments to rules
* cont : make assistant start optional
* cont : remove assistant start prefix altogether
---------
Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>
2026-01-29 16:06:15 +01:00
Sigbjørn Skjæret
b45ef2702c
jinja : do not pass empty tools and add some none filters ( #19176 )
2026-01-29 14:06:54 +01:00
Sigbjørn Skjæret
60368e1d73
jinja : undefined should be treated as sequence/iterable (return string/array) by filters/tests ( #19147 )
...
* undefined is treated as iterable (string/array) by filters
`tojson` is not a supported `undefined` filter
* add tests
* add sequence and iterable tests
keep it DRY and fix some types
2026-01-28 14:40:29 +01:00
Sigbjørn Skjæret
2b4cbd2834
jinja : implement mixed type object keys ( #18955 )
...
* implement mixed type object keys
* add tests
* refactor
* minor fixes
* massive refactor
* add more tests
* forgotten tuples
* fix array/object is_hashable
* correct (albeit broken) jinja responses
verified with transformers
* improved hashing and equality
* refactor hash function
* more exhausive test case
* clean up
* cont
* cont (2)
* missing cstring
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2026-01-27 19:50:42 +01:00
Johannes Gäßler
b0311c16d2
CUDA: fix padding of GQA to power of 2 in FA ( #19115 )
2026-01-26 23:24:58 +01:00
Johannes Gäßler
4e5b83b226
GGUF: check that tensor size is representable ( #19072 )
2026-01-24 21:57:51 +01:00
Xuan-Son Nguyen
51fa458a92
server : support preserving reasoning_content in assistant message ( #18994 )
...
* support reasoning_content input
* report template caps to webui
* add docs
* rm commented code
2026-01-22 21:30:06 +01:00
Georgi Gerganov
a5eaa1d6a3
mla : make the V tensor a view of K ( #18986 )
...
* mla : pass V as a view of K to the FA op
* cuda : adjust mla logic to new layout
* kv-cache : fix rope shift
* tests : remove comment
* cuda : fix reusable_cutoff
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2026-01-22 22:09:01 +02:00
Piotr Wilkin (ilintar)
c301172f66
jinja: support none|string ( #18995 )
...
* jinja: support none|string
* Update common/jinja/value.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update tests/test-jinja.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Add as_string()
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-01-21 19:24:37 +01:00
Jeff Bolz
33f890e579
vulkan: support flash attention GQA/split_k with small batches ( #18938 )
2026-01-21 17:43:43 +01:00
Xuan-Son Nguyen
2c1f199653
cli : fix reasoning responses in CLI ( #18961 )
...
* cli : fix reasoning responses in CLI
* fix build
* fix build (2)
2026-01-20 18:23:25 +01:00
Sigbjørn Skjæret
959ecf7f23
jinja : fix undefined keys and attributes and int/float as bool ( #18924 )
...
* fix undefined keys and attributes
* add falsy tests
* as_bool for integers and floats
* more falsy/truthy tests
* --typo
2026-01-19 20:29:43 +01:00
Sigbjørn Skjæret
4037093c66
ci : run test-jinja -py on high perf [no ci] ( #18916 )
2026-01-19 20:29:15 +01:00
Xuan-Son Nguyen
fe44d35574
tests : add test-jinja -py option for cross-checking ( #18906 )
...
* tests : add test-jinja -py option or cross-checking
* Update tests/test-jinja.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* fix + add source
* SandboxedEnvironment
* fix array.map case
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-01-18 08:14:27 +01:00
Sigbjørn Skjæret
d03c45c9c5
jinja : attribute support for join, map and sort ( #18883 )
...
* support negative array index and default value
* attribute support (int and str) for join, map and sort
* add tests
* update CODEOWNERS
* improve fixme sorting comment
2026-01-18 02:53:01 +01:00
Sigbjørn Skjæret
10c98cbdf6
jinja : add missing tojson filter for bool ( #18900 )
...
* add missing tojson for bool
* add more literal tests
2026-01-18 01:05:09 +01:00
Sigbjørn Skjæret
420960ab92
jinja : fix lexing of float literals with sign ( #18901 )
...
* fix lexing of float literals with sign
* add test
* consume_numeric
2026-01-18 00:57:51 +01:00
Xuan-Son Nguyen
f55b033ae6
jinja: correct member access rule ( #18905 )
2026-01-18 00:48:55 +01:00
Thore Koritzius
388ce82241
ggml : extend ggml_pool_1d + metal ( #16429 )
...
* chore: resolve conflicts
* feat: ggml metal impl
* fix: ggml_metal_kargs_pool_1d struct
* fix: require contiguous input
* chore: test pool_1d
* chore: limit pool1d test cases to p0=0 and s0=k0 to conform with asserts
* chore: add p0 and s0 to testing
* fix: allow padding for cpu and metal
* Update ggml/src/ggml-metal/ggml-metal.metal
* fix: correct single-threaded loop
* ggml : cleanup
* tests : add ne[1] != 1 tests
* fix: ne[1] handling in np
* cont : fixes
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-01-16 16:59:56 +02:00
Xuan-Son Nguyen
c15395f73c
common : implement new jinja template engine ( #18462 )
...
* jinja vm
* lexer
* add vm types
* demo
* clean up
* parser ok
* binary_expression::execute
* shadow naming
* bin ops works!
* fix map object
* add string builtins
* add more builtins
* wip
* use mk_val
* eval with is_user_input
* render gemma tmpl ok
* track input string even after transformations
* support binded functions
* keyword arguments and slicing array
* use shared_ptr for values
* add mk_stmt
* allow print source on exception
* fix negate test
* testing more templates
* mostly works
* add filter_statement
* allow func to access ctx
* add jinja-value.cpp
* impl global_from_json
* a lot of fixes
* more tests
* more fix, more tests
* more fixes
* rm workarounds
* demo: type inferrence
* add placeholder for tojson
* improve function args handling
* rm type inference
* no more std::regex
* trailing spaces
* make testing more flexible
* make output a bit cleaner
* (wip) redirect minja calls
* test: add --output
* fix crash on macro kwargs
* add minimal caps system
* add some workarounds
* rm caps_apply_workarounds
* get rid of preprocessing
* more fixes
* fix test-chat-template
* move test-chat-jinja into test-chat-template
* rm test-chat-jinja from cmake
* test-chat-template: use common
* fix build
* fix build (2)
* rename vm --> interpreter
* improve error reporting
* correct lstrip behavior
* add tojson
* more fixes
* disable tests for COMMON_CHAT_FORMAT_GENERIC
* make sure tojson output correct order
* add object.length
* fully functional selectattr / rejectattr
* improve error reporting
* more builtins added, more fixes
* create jinja rendering tests
* fix testing.h path
* adjust whitespace rules
* more fixes
* temporary disable test for ibm-granite
* r/lstrip behavior matched with hf.js
* minimax, glm4.5 ok
* add append and pop
* kimi-k2 ok
* test-chat passed
* fix lstrip_block
* add more jinja tests
* cast to unsigned char
* allow dict key to be numeric
* nemotron: rm windows newline
* tests ok
* fix test
* rename interpreter --> runtime
* fix build
* add more checks
* bring back generic format support
* fix Apertus
* [json.exception.out_of_range.403] key 'content' not found
* rm generic test
* refactor input marking
* add docs
* fix windows build
* clarify error message
* improved tests
* split/rsplit with maxsplit
* non-inverse maxsplit
forgot to change after simplifying
* implement separators for tojson and fix indent
* i like to move it move it
* rename null -- > none
* token::eof
* some nits + comments
* add exception classes for lexer and parser
* null -> none
* rename global -> env
* rm minja
* update docs
* docs: add input marking caveats
* imlement missing jinja-tests functions
* oops
* support trim filter with args, remove bogus to_json reference
* numerous argument fixes
* updated tests
* implement optional strip chars parameter
* use new chars parameter
* float filter also has default
* always leave at least one decimal in float string
* jinja : static analysis + header cleanup + minor fixes
* add fuzz test
* add string.cpp
* fix chat_template_kwargs
* nits
* fix build
* revert
* unrevert
sorry :)
* add fuzz func_args, refactor to be safer
* fix array.map()
* loosen ensure_vals max count condition, add not impl for map(int)
* hopefully fix windows
* check if empty first
* normalize newlines
---------
Co-authored-by: Alde Rojas <hello@alde.dev>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-01-16 11:22:06 +01:00
Adrien Gallouët
ec997b4f2b
tests : download models only when running ctest ( #18843 )
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-01-15 09:47:29 +01:00
Oliver Simons
36f0132464
CUDA: Factor out and re-use `block_reduce` function ( #18785 )
...
* CUDA: Refactor and expose two_stage_warp_reduce_* function
* Use `two_stage_warp_reduce` also in softmax kernel, move smem out of it
Moving smem out of `__device__` function to `__global__` function
allows for explicit smem reuse, as either compiler or cuda rt seem to not
free it afterwards (`cudaFuncSetAttribute` fails when not accounting for
it once for each call to two_stage_warp_reduce)
* Update ggml/src/ggml-cuda/common.cuh
Co-authored-by: Aman Gupta <amangupta052@gmail.com>
* Use two_stage_warp_reduce in group_norm_f32
* Use two_stage_warp_reduce in rms_norm_f32
* Fix smem calculation which expects bytes
* Make `two_stage_warp_reduce` accept all values warp_reduce accepts
Also integrate it into norm_f32 function
* Use two_stage_warp_reduce in l2_norm_f32
* Use type traits for block reduction for better legibility
Also adresss other requests by @am17an such as variable renaming
* Make norm tests cover all cuda paths
* Mark columns % WARP_SIZE !=0 as supported for RMS_NORM_BACK
Unit-tests passed locally, let's see if they pass in the CI as well
* Use `enum class` for `block_reduce_method`
This is more type-safe than plain enum
* Rename variables as suggested in code review by @am17an
* Rename two_stage_warp_reduce -> block_reduce
* Fix trailing whitespace in common.cuh
* Make condition of static_assert type-dependent
This delays evaluation until the template is actually instantiated.
Otherwise, some compilers may evaluate the assert when parsing the
template, resulting in build errors as observed here:
https://github.com/ggml-org/llama.cpp/actions/runs/20960323123/job/60235530068?pr=18785
* Inline definitions
---------
Co-authored-by: Aman Gupta <amangupta052@gmail.com>
2026-01-15 10:44:54 +08:00
Adrien Gallouët
f709c7a33f
ci, tests : use cmake to download models and remove libcurl dependency ( #18791 )
...
* ci, tests : use cmake to download models and remove libcurl dependency
* llama_dl_model -> llama_download_model
* use EXPECTED_HASH for robust model downloading
* Move llama_download_model to cmake/common.cmake
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-01-14 07:46:27 +01:00