llama.cpp

Commit Graph

Author	SHA1	Message	Date
Imad Saddik	333bfc7231	chore: undo changes in ChatScreenProcessingInfo	2026-03-15 17:53:31 +00:00
Imad Saddik	c6c63786c2	chore: update webui build output	2026-03-15 17:49:34 +00:00
Imad Saddik	de04a9b0e6	style: reset the width of the processing info div	2026-03-15 17:48:16 +00:00
Imad Saddik	715ba4ee85	chore: update webui build output	2026-03-15 17:40:36 +00:00
Imad Saddik	fa7d3a96c5	fix: keep the container spanning the whole width to fix scroll bar issue	2026-03-15 17:39:19 +00:00
Imad Saddik	c399ec3c46	chore: update webui build output	2026-03-15 17:36:11 +00:00
Imad Saddik	72c1928dc9	refactor: move widthClasses.class to the container div	2026-03-15 17:34:31 +00:00
Imad Saddik	0cd953eea2	chore: update webui build output	2026-03-15 17:13:14 +00:00
Imad Saddik	6074619ba4	style: restore class for checkbox labels	2026-03-15 17:11:58 +00:00
Imad Saddik	297abf8450	chore: update webui build output	2026-03-15 17:09:59 +00:00
Imad Saddik	d4034eff07	fix: update chatWidthClasses to use autoChatWidth configuration	2026-03-15 17:08:43 +00:00
Imad Saddik	b73209694d	chore: update webui build output	2026-03-15 17:03:58 +00:00
Imad Saddik	2630c27754	refactor: simplify chatWidthClasses getter logic and remove widthClasses.class	2026-03-15 17:02:41 +00:00
Imad Saddik	1a6f21f25c	chore: revert package-lock.json to match master	2026-03-15 16:56:41 +00:00
Imad Saddik	95be04617e	chore: update webui build output	2026-03-15 16:56:08 +00:00
Imad Saddik	2836834801	refactor: remove anything related to the custom chat width setting	2026-03-15 16:54:43 +00:00
Imad Saddik	20a8227933	chore: update webui build output	2026-03-15 16:44:35 +00:00
Imad Saddik	89647d5daf	chore: downgrade @lucide/svelte version and remove custom chat width component	2026-03-15 16:43:18 +00:00
Imad Saddik	c55533a706	chore: update webui build output	2026-03-14 09:00:23 +00:00
Imad Saddik	29ede762c4	refactor: don't reset custom chat width when the auto width is checked	2026-03-14 08:59:02 +00:00
Imad Saddik	e8eccf9b35	feat: update chatWidthClasses to prioritize auto chat width	2026-03-14 08:57:25 +00:00
Imad Saddik	23758f3ba8	feat: add syncable parameters for auto and custom chat width	2026-03-14 08:55:56 +00:00
Imad Saddik	b7851305df	chore: update webui build output	2026-03-14 08:29:24 +00:00
Imad Saddik	e2a6be14e7	fix: pass style to ChatMessageUser	2026-03-14 08:09:33 +00:00
Imad Saddik	bcc95c98cb	refactor: remove chatWidthClasses from ChatForm	2026-03-14 08:05:20 +00:00
Imad Saddik	16fcb29197	fix: use widthClasses in ChatScreenForm	2026-03-14 08:03:48 +00:00
Imad Saddik	5a721b5678	chore: update webui build output	2026-03-14 06:57:42 +00:00
Imad Saddik	ee944af476	style: fix indentation and formatting in ChatForm.svelte	2026-03-14 06:54:26 +00:00
Imad Saddik	8f9571a5c2	refactor: use derived on chatWidthClasses for consistency	2026-03-14 06:53:13 +00:00
Imad Saddik	0306577300	chore: update webui build output	2026-03-14 06:48:49 +00:00
Imad Saddik	c3cb3fcfcd	refactor: remove unused chat width parameters from syncable parameters	2026-03-14 06:47:28 +00:00
Imad Saddik	19986697e3	chore: update webui build output	2026-03-14 06:46:26 +00:00
Imad Saddik	8cef196854	style: fix formatting	2026-03-14 06:44:43 +00:00
Imad Saddik	4561f25021	refactor: call chatWidthClasses once and reuse it everywhere	2026-03-14 06:43:19 +00:00
Imad Saddik	165234e722	fix: correct typo in disabled message for automatic width	2026-03-14 06:37:06 +00:00
Imad Saddik	b9545a1021	chore: update webui build output	2026-03-14 06:33:38 +00:00
Imad Saddik	5dd9b7d888	Merge branch 'master' into feat/change_chat_screen_width	2026-03-14 06:32:15 +00:00
Adrien Gallouët	77e20cc107	vendor : update cpp-httplib to 0.37.2 (#20484 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-03-14 06:51:02 +01:00
Rail Chabdarov	5a32a9b8a5	Fix data race in CUDA's "cpy" kernel (influences GGML's DUP, CONT operations). (#20507 ) * Fix datarace in CUDA's "cpy" kernel. * Remove extra barrier by using more of shared memory.	2026-03-14 13:19:44 +08:00
lhez	3b439504ba	opencl: fix l2_norm (#20480 )	2026-03-13 22:18:52 -07:00
Adrien Gallouët	463b6a963c	tools : enable kvu in perplexity for hellaswag, winogrande, multiple-choice (#19954 ) llama-perplexity -hf unsloth/Qwen3-0.6B-GGUF:Q4_K_M -f winogrande-debiased-eval.csv --winogrande winogrande_score : tokenizing selected tasks winogrande_score : calculating winogrande score over selected tasks. split_equal: sequential split is not supported when there are coupled sequences in the input batch (you may need to use the -kvu flag) decode: failed to find a memory slot for batch of size 46 failed to decode the batch, n_batch = 2048, ret = 1 winogrande_score: llama_decode() failed same for hellaswag: split_equal: sequential split is not supported when there are coupled sequences in the input batch (you may need to use the -kvu flag) decode: failed to find a memory slot for batch of size 99 failed to decode the batch, n_batch = 2048, ret = 1 hellaswag_score: llama_decode() failed Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-03-13 21:25:57 +01:00
Georgi Gerganov	e30f1fdf74	graph : remove redundant GDN state transposes (#20443 ) * ggml : transpose fused GDN state access for coalesced memory reads (#20436) The fused Gated Delta Net kernel accessed the [S_v, S_v] state matrix column-wise on row-major storage, causing strided reads (stride S_v = 128 floats = 512 bytes) that waste GPU cache bandwidth. This produced a 39% regression on Qwen3.5-9B (Metal, M4 Max) compared to the unfused path. Transpose the state indexing so threads read contiguously: - Metal: s_ptr[isS_v] -> s_ptr[is] (stride 1 vs S_v) - CUDA: curr_state[iS_v+col] -> curr_state[colS_v+i] (coalesced) - CPU: restructured loops for row-wise transposed access Also add --fused-gdn [on\|off\|auto] CLI flag (mirrors --flash-attn) so users can control fused GDN independently of auto-detection. All GATED_DELTA_NET backend-ops tests pass. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> ggml : use SIMD dot products in CPU GDN kernel, couple AR/chunked fused flags - Replace scalar inner loops with ggml_vec_dot_f32 for SIMD-optimized dot products in the CPU fused GDN kernel (delta and attention output) - Couple fused_gdn_ar and fused_gdn_ch flags in auto-detection: if one path lacks device support, disable both to prevent state layout mismatch between transposed (fused) and non-transposed (unfused) formats Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * llama : rever fgdn argument changes * graph : remove GDN state transposes * vulkan : adapt * cuda : remove obsolete smem code --------- Co-authored-by: Paul Flynn <paul@arkavo.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Oliver Simons <osimons@nvidia.com>	2026-03-13 22:12:54 +02:00
Piotr Wilkin (ilintar)	1430c35948	common/parser: gracefully handle undetected tool parser, print error message. (#20286 )	2026-03-13 20:56:10 +01:00
ZeroV0LT	f17b3be63f	llama : fix pooling assertion crash in chunked GDN detection path (#20468 ) * llama : fix pooling assertion crash in chunked GDN detection path The chunked fused Gated Delta Net detection in sched_reserve() calls graph_reserve(16n_seqs, n_seqs, n_outputs, ...) where n_outputs = n_seqs. This creates a dimension mismatch in build_pooling() for embedding models with mean/rank pooling: build_inp_mean() creates a tensor with shape [n_tokens=16n_seqs, ...] while t_embd is reduced to [n_outputs=n_seqs, ...] via out_ids, causing ggml_mul_mat to assert on ggml_can_mul_mat(a, b). Fix: pass n_tokens as n_outputs in the chunked GDN graph reservation, matching the pattern used by the pp/tg worst-case reservations. Regression introduced by #20340 (`d28961d`). Same class of bug as #12517, fixed by #12545. * server : add mean pooling tests to embedding test suite Add test_embedding_pooling_mean and test_embedding_pooling_mean_multiple to cover the --pooling mean codepath, which was previously untested. These tests would have caught the regression introduced by #20340 where build_pooling() crashes with a ggml_mul_mat assertion due to mismatched dimensions in the chunked GDN detection path. --------- Co-authored-by: Domenico Crupi <domenico@zerovolt.it>	2026-03-13 20:53:42 +02:00
SoftwareRenderer	d7ba99c485	server: reset counter related to kill-switch on client error (#20513 ) * server: reset kill-switch on client error This avoids triggering a server kill switch. If the client sends a request that exceeds the configured context size, an appropriate HTTP 400 response is provided and no tokens are generated. However since no tokens are generated, update_slots() increments n_empty_consecutive. If the client sends 3 such messages in a row, the server terminates. * moved counter reset as per recommendation * cont : minor --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2026-03-13 19:58:09 +02:00
rehan-10xengineer	fbaa95bc29	ggml-cpu: add RVV vec dot kernels for quantization types (#18859 ) * ggml-cpu: add rvv quantize_row_q8_K kernel Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai> * ggml-cpu: add rvv vec_dot for iq4_nl, mxfp4, iq2_xxs Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai> * ggml-cpu: add rvv vec_dot for iq4_xs, refactor * ggml-cpu: remove ifunc for rvv vec dot * ggml-cpu: add vec_dot for iq2_xs, iq3_xxs Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai> * ggml-cpu: refactor quants.c --------- Co-authored-by: taimur-10x <taimur.ahmad@10xengineers.ai> Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai> Co-authored-by: Rehan Qasim <rehanbhatti0317@gmail.com>	2026-03-13 17:36:04 +02:00
Adrien Gallouët	b5e1212063	ggml : fix typo gmml (#20512 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-03-13 14:36:13 +01:00
Daniel Bevenius	8f974d2392	mtmd : rename mtmd_get_audio_bitrate to mtmd_get_audio_sample_rate (#20105 ) This commit renames the the function `mtmd_get_audio_bitrate` to `mtmd_get_audio_sample_rate` to better reflect its purpose. The motivation for this is that the function currently returns the audio sample rate, not the bitrate (sample_rate × bit_depth × channels), and that is how it is used in the code as well. This is a breaking change, but I believe mtmd is still in experimental/development phase so it might be alright to simply rename.	2026-03-13 12:30:02 +01:00
Piotr Wilkin (ilintar)	2948e6049a	general: CONTRIBUTING.md - guidelines for quantization schemes (#19762 ) * Guidelines for quantization schemes * Update CONTRIBUTING.md Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * Change required precision from Q8 to FP16/BF16 * Update CONTRIBUTING.md Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * Update CONTRIBUTING.md Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * Update CONTRIBUTING.md Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * Update CONTRIBUTING.md Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * Update CONTRIBUTING.md [no ci] * Update CONTRIBUTING.md [no ci] --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2026-03-13 12:21:33 +01:00
Georgi Gerganov	73c9eb8ced	metal : fix l2 norm scale (#20493 )	2026-03-13 11:43:20 +02:00

1 2 3 4 5 ...

8412 Commits All Branches Search

8412 Commits

All Branches