Commit Graph

8108 Commits

Author SHA1 Message Date
Piotr Wilkin 13a350fa1a Whitespace 2026-02-16 22:39:12 +01:00
Piotr Wilkin b0ed986aec -> Refactor autoparser analyzer structure
-> Fix content truncation
-> Fix errors in capability detection due to non-empty assistant message
-> Add missing debug prints for Jinja
2026-02-16 22:39:12 +01:00
Piotr Wilkin 2da282018e Whoops 2026-02-16 22:39:12 +01:00
Piotr Wilkin 58d822ca0c One more crazy spacing out 2026-02-16 22:39:12 +01:00
Piotr Wilkin f8b0b75a00 Get rid of some crazy formatting 2026-02-16 22:39:12 +01:00
Piotr Wilkin 18054b4e44 Document helpers 2026-02-16 22:39:12 +01:00
Piotr Wilkin ffdce9ca29 Add compilation guard to fix Windows compilation errors 2026-02-16 22:39:12 +01:00
Piotr Wilkin 5e38bac7a3 Post-merge adapt 2026-02-16 22:39:12 +01:00
Piotr Wilkin ed82289609 Revert obsolete server-context change 2026-02-16 22:39:12 +01:00
Piotr Wilkin 964972d64e Fix windows build 2026-02-16 22:39:12 +01:00
Piotr Wilkin 0a2090a8d6 Regenerate documentation 2026-02-16 22:39:12 +01:00
Piotr Wilkin 5164f2f3c8 Fix case with object inside object, refactor long methods. 2026-02-16 22:39:12 +01:00
Piotr Wilkin 8397fdddc6 Fix number partial parsing issue 2026-02-16 22:39:12 +01:00
Piotr Wilkin 5df5390c72 More edge cases 2026-02-16 22:39:12 +01:00
Piotr Wilkin 971b216ce1 Fix pesky issue on optional trailing arguments in function calls for TAGGED format 2026-02-16 22:39:11 +01:00
Piotr Wilkin fcc61e6759 Remove [[noreturn]] as it causes compilation problems on Mac. 2026-02-16 22:39:11 +01:00
Piotr Wilkin b223a7b1aa We don't like segfaults (or failing tests). 2026-02-16 22:39:11 +01:00
Piotr Wilkin 4249e9889f Fix minor regressions, add [[noreturn]] attrib 2026-02-16 22:39:11 +01:00
Piotr Wilkin 0abe32a3d8 Fix incorrect coercion of strings to non-string types during parsing 2026-02-16 22:39:11 +01:00
Piotr Wilkin f1937febff Feeding the hungry editor checker god. 2026-02-16 22:39:11 +01:00
Piotr Wilkin c35b31abe5 Fix error in argument processing 2026-02-16 22:39:11 +01:00
Piotr Wilkin 5cabb3c737 Reverd bad change fix some templates and most tests 2026-02-16 22:39:11 +01:00
Piotr Wilkin bb6337fb90 More robust reasoning detection 2026-02-16 22:39:11 +01:00
Piotr Wilkin 169a0fa0f6 Fix reasoning detection 2026-02-16 22:39:11 +01:00
Piotr Wilkin 2eedbb24e0 Quick vibe-coded fix for proper object printing 2026-02-16 22:39:11 +01:00
Piotr Wilkin a4feadb10d Missed this. 2026-02-16 22:39:11 +01:00
Piotr Wilkin 1e3d93cb6b ANOTHER GIANT POST-FIXUP SQUISH 2026-02-16 22:39:11 +01:00
Piotr Wilkin 52d31fa024 THE GIANT AUTOPARSER SQUISH 2026-02-16 22:39:11 +01:00
Piotr Wilkin 052ad2ab8a Make call IDs nine-character 2026-02-16 22:39:11 +01:00
Piotr Wilkin 47a7ebc0c1 Fix sanitizer warnings 2026-02-16 22:39:11 +01:00
Piotr Wilkin b403c9aaa2 Fix bad typo 2026-02-16 22:39:11 +01:00
Piotr Wilkin f2a4ae6ba8 Add workaround for templates requiring non-null content 2026-02-16 22:39:11 +01:00
AesSedai d612901116
perplexity: add proper batching (#19661) 2026-02-16 18:44:44 +02:00
Ivan Chikish cceb1b4e33
common : inline functions (#18639) 2026-02-16 17:52:24 +02:00
Judd d23a55997d
ggml : make `ggml_is_view` as API (#19539)
* make `ggml_is_view` as API

* introduce `ggml_aux_is_view` as inline version for internal use.

* change `ggml_aux_is_view` to  `ggml_impl_is_view`
2026-02-16 17:43:34 +02:00
Saurabh Dash 5f28c53d11
model: Add support for Tiny Aya Models (#19611)
* changes for tiny aya

* changes to hash

* changes to vocab

* fix some tokenizer regex edge cases

* update comment

* add some comments for regex

* Apply suggestion from @ngxson

---------

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2026-02-16 16:28:46 +01:00
Adrien Gallouët 4408494144
build : rework llama_option_depr to handle LLAMA_CURL (#19658)
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-02-16 16:06:48 +01:00
Mario Limonciello 2ba9adc093
Adjust workaround for ROCWMMA_FATTN/GFX9 to only newer ROCm veresions (#19591)
Avoids issues with ROCm 6.4.4.

Closes: https://github.com/ggml-org/llama.cpp/issues/19580
Fixes: 6845f7f87 ("Add a workaround for compilation with ROCWMMA_FATTN and gfx9 (#19461)")

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
2026-02-16 14:46:08 +01:00
Georgi Gerganov cc45f2ada6
models : deduplicate delta-net graphs for Qwen family (#19597)
* models : add llm_build_delta_net_base

* cont : keep qwen35 and qwen35moe graphs intact

* cont : add comments
2026-02-16 14:35:04 +02:00
Georgi Gerganov d5dfc33027
graph : fix KQ mask, lora, cvec reuse checks (#19644)
* graph : fix KQ mask reuse condition

* cont : dedup KQ mask build and can_reuse

* cont : fix build

* graph : fix adapter check for reuse
2026-02-16 09:21:11 +02:00
abhijain1204fujitsu 267ba5a1d9
ggml: aarch64: Implement SVE in Gemm q4_k 8x8 q8_k Kernel (#19132)
* Updated repack.cpp

* Updated repack.cpp

* Updated repack.cpp

* Added if condition to support only vector length 256.

* Changed the format removed comments and duplicate variable

* If SVE 256 not present then was using generic function to compute, hence slowing the performance. 

So added code if SVE 256 is not present then use NEON code.

* Code format change suggestion

---------

Co-authored-by: Vithule, Prashant <Prashant.Vithule@fujitsu.com>
2026-02-16 14:38:43 +08:00
Georgi Gerganov ff4affb4c1 sync : ggml 2026-02-15 22:24:29 +02:00
Georgi Gerganov 55d58599c8 ggml : bump version to 0.9.7 (ggml/1425) 2026-02-15 22:24:29 +02:00
Georgi Gerganov 1a8c700bfd ggml : bump version to 0.9.6 (ggml/1423) 2026-02-15 22:24:29 +02:00
David Friehs 27b93cbd15
cuda: optimize iq2xxs/iq2xs/iq3xxs dequantization (#19624)
* cuda: optimize iq2xxs/iq2xs/iq3xxs dequantization

- load all 8 int8 for a grid position in one load
- calculate signs via popcnt instead of fetching from ksigns table
- broadcast signs to drop individual shift/mask

* cuda: iq2xxs: simplify sum scaling

express `(sum * scale + sum / 2) / 4` as `(sum * (scale * 2 + 1)) / 8`
express `((aux32 >> 28) * 2 + 1)` as `(aux32 >> 27 | 1)`

saves 3 registers for mul_mat_vec_q (152 -> 149) according to nsight
AFAICT no overflow can occur here as iq2xxs values are far too small

* uint -> uint32_t

error: identifier "uint" is undefined
2026-02-15 22:38:42 +05:30
Aaron Teo 6e67fd2144
docs: update s390x build docs (#19643) 2026-02-16 00:33:34 +08:00
Adrien Gallouët 9e118b97c4
build : remove LLAMA_HTTPLIB option (#19623)
This option was introduced as a workaround because cpp-httplib could not
build on visionOS. Since it has been fixed and now compiles on all platforms,
we can remove it and simplify many things.

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-02-15 15:38:50 +01:00
Daniel Bevenius 57088276d4
cmake : check if KleidiAI API has been fetched (#19640)
This commit addresses a build issue with the KleidiAI backend when
building multiple cpu backends. Commmit
3a00c98584 ("cmake : fix KleidiAI install
target failure with EXCLUDE_FROM_ALL") introduced a change where
FetchContent_Populate is called instead of FetchContent_MakeAvailable,
where the latter does handle this case (it is idempotent but
FetchContent_Populate is not).

I missed this during my review and I should not have commited without
verifying the CI failure, sorry about that.
2026-02-15 13:59:38 +01:00
Georgi Gerganov 341bc7d23c
context : fix output reorder with backend sampling (#19638) 2026-02-15 14:57:40 +02:00
Georgi Gerganov 08e6d914b8
ggml : avoid UB in gemm ukernel (#19642) 2026-02-15 14:56:35 +02:00