llama.cpp

Commit Graph

Author	SHA1	Message	Date
Piotr Wilkin	0a2090a8d6	Regenerate documentation	2026-02-16 22:39:12 +01:00
Piotr Wilkin	5164f2f3c8	Fix case with object inside object, refactor long methods.	2026-02-16 22:39:12 +01:00
Piotr Wilkin	8397fdddc6	Fix number partial parsing issue	2026-02-16 22:39:12 +01:00
Piotr Wilkin	5df5390c72	More edge cases	2026-02-16 22:39:12 +01:00
Piotr Wilkin	971b216ce1	Fix pesky issue on optional trailing arguments in function calls for TAGGED format	2026-02-16 22:39:11 +01:00
Piotr Wilkin	fcc61e6759	Remove [[noreturn]] as it causes compilation problems on Mac.	2026-02-16 22:39:11 +01:00
Piotr Wilkin	b223a7b1aa	We don't like segfaults (or failing tests).	2026-02-16 22:39:11 +01:00
Piotr Wilkin	4249e9889f	Fix minor regressions, add [[noreturn]] attrib	2026-02-16 22:39:11 +01:00
Piotr Wilkin	0abe32a3d8	Fix incorrect coercion of strings to non-string types during parsing	2026-02-16 22:39:11 +01:00
Piotr Wilkin	f1937febff	Feeding the hungry editor checker god.	2026-02-16 22:39:11 +01:00
Piotr Wilkin	c35b31abe5	Fix error in argument processing	2026-02-16 22:39:11 +01:00
Piotr Wilkin	5cabb3c737	Reverd bad change fix some templates and most tests	2026-02-16 22:39:11 +01:00
Piotr Wilkin	bb6337fb90	More robust reasoning detection	2026-02-16 22:39:11 +01:00
Piotr Wilkin	169a0fa0f6	Fix reasoning detection	2026-02-16 22:39:11 +01:00
Piotr Wilkin	2eedbb24e0	Quick vibe-coded fix for proper object printing	2026-02-16 22:39:11 +01:00
Piotr Wilkin	a4feadb10d	Missed this.	2026-02-16 22:39:11 +01:00
Piotr Wilkin	1e3d93cb6b	ANOTHER GIANT POST-FIXUP SQUISH	2026-02-16 22:39:11 +01:00
Piotr Wilkin	52d31fa024	THE GIANT AUTOPARSER SQUISH	2026-02-16 22:39:11 +01:00
Piotr Wilkin	052ad2ab8a	Make call IDs nine-character	2026-02-16 22:39:11 +01:00
Piotr Wilkin	47a7ebc0c1	Fix sanitizer warnings	2026-02-16 22:39:11 +01:00
Piotr Wilkin	b403c9aaa2	Fix bad typo	2026-02-16 22:39:11 +01:00
Piotr Wilkin	f2a4ae6ba8	Add workaround for templates requiring non-null content	2026-02-16 22:39:11 +01:00
AesSedai	d612901116	perplexity: add proper batching (#19661 )	2026-02-16 18:44:44 +02:00
Ivan Chikish	cceb1b4e33	common : inline functions (#18639 )	2026-02-16 17:52:24 +02:00
Judd	d23a55997d	ggml : make `ggml_is_view` as API (#19539 ) * make `ggml_is_view` as API * introduce `ggml_aux_is_view` as inline version for internal use. * change `ggml_aux_is_view` to `ggml_impl_is_view`	2026-02-16 17:43:34 +02:00
Saurabh Dash	5f28c53d11	model: Add support for Tiny Aya Models (#19611 ) * changes for tiny aya * changes to hash * changes to vocab * fix some tokenizer regex edge cases * update comment * add some comments for regex * Apply suggestion from @ngxson --------- Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>	2026-02-16 16:28:46 +01:00
Adrien Gallouët	4408494144	build : rework llama_option_depr to handle LLAMA_CURL (#19658 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-02-16 16:06:48 +01:00
Mario Limonciello	2ba9adc093	Adjust workaround for ROCWMMA_FATTN/GFX9 to only newer ROCm veresions (#19591 ) Avoids issues with ROCm 6.4.4. Closes: https://github.com/ggml-org/llama.cpp/issues/19580 Fixes: `6845f7f87` ("Add a workaround for compilation with ROCWMMA_FATTN and gfx9 (#19461)") Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>	2026-02-16 14:46:08 +01:00
Georgi Gerganov	cc45f2ada6	models : deduplicate delta-net graphs for Qwen family (#19597 ) * models : add llm_build_delta_net_base * cont : keep qwen35 and qwen35moe graphs intact * cont : add comments	2026-02-16 14:35:04 +02:00
Georgi Gerganov	d5dfc33027	graph : fix KQ mask, lora, cvec reuse checks (#19644 ) * graph : fix KQ mask reuse condition * cont : dedup KQ mask build and can_reuse * cont : fix build * graph : fix adapter check for reuse	2026-02-16 09:21:11 +02:00
abhijain1204fujitsu	267ba5a1d9	ggml: aarch64: Implement SVE in Gemm q4_k 8x8 q8_k Kernel (#19132 ) * Updated repack.cpp * Updated repack.cpp * Updated repack.cpp * Added if condition to support only vector length 256. * Changed the format removed comments and duplicate variable * If SVE 256 not present then was using generic function to compute, hence slowing the performance. So added code if SVE 256 is not present then use NEON code. * Code format change suggestion --------- Co-authored-by: Vithule, Prashant <Prashant.Vithule@fujitsu.com>	2026-02-16 14:38:43 +08:00
Georgi Gerganov	ff4affb4c1	sync : ggml	2026-02-15 22:24:29 +02:00
Georgi Gerganov	55d58599c8	ggml : bump version to 0.9.7 (ggml/1425)	2026-02-15 22:24:29 +02:00
Georgi Gerganov	1a8c700bfd	ggml : bump version to 0.9.6 (ggml/1423)	2026-02-15 22:24:29 +02:00
David Friehs	27b93cbd15	cuda: optimize iq2xxs/iq2xs/iq3xxs dequantization (#19624 ) * cuda: optimize iq2xxs/iq2xs/iq3xxs dequantization - load all 8 int8 for a grid position in one load - calculate signs via popcnt instead of fetching from ksigns table - broadcast signs to drop individual shift/mask * cuda: iq2xxs: simplify sum scaling express `(sum * scale + sum / 2) / 4` as `(sum * (scale * 2 + 1)) / 8` express `((aux32 >> 28) * 2 + 1)` as `(aux32 >> 27 \| 1)` saves 3 registers for mul_mat_vec_q (152 -> 149) according to nsight AFAICT no overflow can occur here as iq2xxs values are far too small * uint -> uint32_t error: identifier "uint" is undefined	2026-02-15 22:38:42 +05:30
Aaron Teo	6e67fd2144	docs: update s390x build docs (#19643 )	2026-02-16 00:33:34 +08:00
Adrien Gallouët	9e118b97c4	build : remove LLAMA_HTTPLIB option (#19623 ) This option was introduced as a workaround because cpp-httplib could not build on visionOS. Since it has been fixed and now compiles on all platforms, we can remove it and simplify many things. Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-02-15 15:38:50 +01:00
Daniel Bevenius	57088276d4	cmake : check if KleidiAI API has been fetched (#19640 ) This commit addresses a build issue with the KleidiAI backend when building multiple cpu backends. Commmit `3a00c98584` ("cmake : fix KleidiAI install target failure with EXCLUDE_FROM_ALL") introduced a change where FetchContent_Populate is called instead of FetchContent_MakeAvailable, where the latter does handle this case (it is idempotent but FetchContent_Populate is not). I missed this during my review and I should not have commited without verifying the CI failure, sorry about that.	2026-02-15 13:59:38 +01:00
Georgi Gerganov	341bc7d23c	context : fix output reorder with backend sampling (#19638 )	2026-02-15 14:57:40 +02:00
Georgi Gerganov	08e6d914b8	ggml : avoid UB in gemm ukernel (#19642 )	2026-02-15 14:56:35 +02:00
Aaron Teo	184c694f45	ggml-cpu: optimize ggml_vec_dot_bf16 for s390x (#19399 )	2026-02-15 18:20:35 +08:00
Aman Gupta	684b36101c	ggml-cpu: FA add GEMM microkernel (#19422 ) * ggml-cpu: FA add GEMM microkernel * add guard for sizeless vector types * fix case where DV % GGML_F32_EPR !=0 * move memset out of the loop * move another memset out of the loop * use RM=4 for arm * simd_gemm: convert everything to int * convert everything to size_t to avoid warnings * fixup * add pragma for ignoring aggressive loop optimizations	2026-02-15 11:09:24 +05:30
SamareshSingh	3a00c98584	cmake : fix KleidiAI install target failure with EXCLUDE_FROM_ALL (#19581 ) * cmake: fix KleidiAI install target failure with EXCLUDE_FROM_ALL Fix for the bug #19501 by adding EXCLUDE_FROM_ALL to FetchContent_Declare. This properly excludes KleidiAI from both build and install targets, preventing install failures when GGML_CPU_KLEIDIAI=ON is used. The KleidiAI source files are still compiled into libggml-cpu.so, preserving all functionality. * addressed code review comments	2026-02-15 06:22:53 +01:00
Sigbjørn Skjæret	079feab9e3	convert : ensure all models handle new experts count (#19621 ) * ensure all models handle new experts count * revert removal for PhiMoeModel, does not inherit from base	2026-02-14 22:22:32 +01:00
Anav Prasad	01d8eaa28d	mtmd : Add Nemotron Nano 12B v2 VL support (#19547 ) * nemotron nano v2 vlm support added * simplified code; addressed reviews * pre-downsample position embeddings during GGUF conversion for fixed input size	2026-02-14 14:07:00 +01:00
Georgi Gerganov	1725e316c1	models : optimize qwen3next graph (#19375 ) * models : optimizing qwen3next graph * cont * wip * wip * wip * wip * wip * wip * wip * wip * wip * wip * cont : remove redundant q, g chunking * minor * minor * avoid passing masks around * avoid concats during chunking * naming + shapes * update names and use prefix to disable CUDA graphs	2026-02-14 12:57:36 +02:00
Adrien Gallouët	b7742cf321	ggml : fix GGML_DEBUG with OpenMP (#19599 ) last_graph is only available without OpenMP, but ggml_graph_compute_thread() is called in both cases. Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-02-14 11:22:57 +01:00
iMil	badba89320	NetBSD build support (#19589 )	2026-02-14 09:47:01 +01:00
Aleksander Grygier	baa12f3831	webui: Architecture and UI improvements (#19596 )	2026-02-14 09:06:41 +01:00
agent-enemy-2	2d8015e8a4	llama : update LoRA API. + fix excessive graph reserves (#19280 ) * Refactoring to use new llama_put_adapter_loras * cont : alternative lora API --------- Co-authored-by: Jake Chavis <jakechavis6@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2026-02-14 10:06:27 +02:00

1 2 3 4 5 ...

8098 Commits All Branches Search

8098 Commits

All Branches