Commit Graph

8412 Commits

Author SHA1 Message Date
Imad Saddik 333bfc7231 chore: undo changes in ChatScreenProcessingInfo 2026-03-15 17:53:31 +00:00
Imad Saddik c6c63786c2 chore: update webui build output 2026-03-15 17:49:34 +00:00
Imad Saddik de04a9b0e6 style: reset the width of the processing info div 2026-03-15 17:48:16 +00:00
Imad Saddik 715ba4ee85 chore: update webui build output 2026-03-15 17:40:36 +00:00
Imad Saddik fa7d3a96c5 fix: keep the container spanning the whole width to fix scroll bar issue 2026-03-15 17:39:19 +00:00
Imad Saddik c399ec3c46 chore: update webui build output 2026-03-15 17:36:11 +00:00
Imad Saddik 72c1928dc9 refactor: move widthClasses.class to the container div 2026-03-15 17:34:31 +00:00
Imad Saddik 0cd953eea2 chore: update webui build output 2026-03-15 17:13:14 +00:00
Imad Saddik 6074619ba4 style: restore class for checkbox labels 2026-03-15 17:11:58 +00:00
Imad Saddik 297abf8450 chore: update webui build output 2026-03-15 17:09:59 +00:00
Imad Saddik d4034eff07 fix: update chatWidthClasses to use autoChatWidth configuration 2026-03-15 17:08:43 +00:00
Imad Saddik b73209694d chore: update webui build output 2026-03-15 17:03:58 +00:00
Imad Saddik 2630c27754 refactor: simplify chatWidthClasses getter logic and remove widthClasses.class 2026-03-15 17:02:41 +00:00
Imad Saddik 1a6f21f25c chore: revert package-lock.json to match master 2026-03-15 16:56:41 +00:00
Imad Saddik 95be04617e chore: update webui build output 2026-03-15 16:56:08 +00:00
Imad Saddik 2836834801 refactor: remove anything related to the custom chat width setting 2026-03-15 16:54:43 +00:00
Imad Saddik 20a8227933 chore: update webui build output 2026-03-15 16:44:35 +00:00
Imad Saddik 89647d5daf chore: downgrade @lucide/svelte version and remove custom chat width component 2026-03-15 16:43:18 +00:00
Imad Saddik c55533a706 chore: update webui build output 2026-03-14 09:00:23 +00:00
Imad Saddik 29ede762c4 refactor: don't reset custom chat width when the auto width is checked 2026-03-14 08:59:02 +00:00
Imad Saddik e8eccf9b35 feat: update chatWidthClasses to prioritize auto chat width 2026-03-14 08:57:25 +00:00
Imad Saddik 23758f3ba8 feat: add syncable parameters for auto and custom chat width 2026-03-14 08:55:56 +00:00
Imad Saddik b7851305df chore: update webui build output 2026-03-14 08:29:24 +00:00
Imad Saddik e2a6be14e7 fix: pass style to ChatMessageUser 2026-03-14 08:09:33 +00:00
Imad Saddik bcc95c98cb refactor: remove chatWidthClasses from ChatForm 2026-03-14 08:05:20 +00:00
Imad Saddik 16fcb29197 fix: use widthClasses in ChatScreenForm 2026-03-14 08:03:48 +00:00
Imad Saddik 5a721b5678 chore: update webui build output 2026-03-14 06:57:42 +00:00
Imad Saddik ee944af476 style: fix indentation and formatting in ChatForm.svelte 2026-03-14 06:54:26 +00:00
Imad Saddik 8f9571a5c2 refactor: use derived on chatWidthClasses for consistency 2026-03-14 06:53:13 +00:00
Imad Saddik 0306577300 chore: update webui build output 2026-03-14 06:48:49 +00:00
Imad Saddik c3cb3fcfcd refactor: remove unused chat width parameters from syncable parameters 2026-03-14 06:47:28 +00:00
Imad Saddik 19986697e3 chore: update webui build output 2026-03-14 06:46:26 +00:00
Imad Saddik 8cef196854 style: fix formatting 2026-03-14 06:44:43 +00:00
Imad Saddik 4561f25021 refactor: call chatWidthClasses once and reuse it everywhere 2026-03-14 06:43:19 +00:00
Imad Saddik 165234e722 fix: correct typo in disabled message for automatic width 2026-03-14 06:37:06 +00:00
Imad Saddik b9545a1021 chore: update webui build output 2026-03-14 06:33:38 +00:00
Imad Saddik 5dd9b7d888 Merge branch 'master' into feat/change_chat_screen_width 2026-03-14 06:32:15 +00:00
Adrien Gallouët 77e20cc107
vendor : update cpp-httplib to 0.37.2 (#20484)
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-14 06:51:02 +01:00
Rail Chabdarov 5a32a9b8a5
Fix data race in CUDA's "cpy" kernel (influences GGML's DUP, CONT operations). (#20507)
* Fix datarace in CUDA's "cpy" kernel.

* Remove extra barrier by using more of shared memory.
2026-03-14 13:19:44 +08:00
lhez 3b439504ba
opencl: fix l2_norm (#20480) 2026-03-13 22:18:52 -07:00
Adrien Gallouët 463b6a963c
tools : enable kvu in perplexity for hellaswag, winogrande, multiple-choice (#19954)
llama-perplexity -hf unsloth/Qwen3-0.6B-GGUF:Q4_K_M -f winogrande-debiased-eval.csv --winogrande

    winogrande_score : tokenizing selected tasks
    winogrande_score : calculating winogrande score over selected tasks.
    split_equal: sequential split is not supported when there are coupled sequences in the input batch (you may need to use the -kvu flag)
    decode: failed to find a memory slot for batch of size 46
    failed to decode the batch, n_batch = 2048, ret = 1
    winogrande_score: llama_decode() failed

same for hellaswag:

    split_equal: sequential split is not supported when there are coupled sequences in the input batch (you may need to use the -kvu flag)
    decode: failed to find a memory slot for batch of size 99
    failed to decode the batch, n_batch = 2048, ret = 1
    hellaswag_score: llama_decode() failed

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-13 21:25:57 +01:00
Georgi Gerganov e30f1fdf74
graph : remove redundant GDN state transposes (#20443)
* ggml : transpose fused GDN state access for coalesced memory reads (#20436)

The fused Gated Delta Net kernel accessed the [S_v, S_v] state matrix
column-wise on row-major storage, causing strided reads (stride S_v =
128 floats = 512 bytes) that waste GPU cache bandwidth. This produced a
39% regression on Qwen3.5-9B (Metal, M4 Max) compared to the unfused
path.

Transpose the state indexing so threads read contiguously:
- Metal: s_ptr[is*S_v] -> s_ptr[is] (stride 1 vs S_v)
- CUDA:  curr_state[i*S_v+col] -> curr_state[col*S_v+i] (coalesced)
- CPU:   restructured loops for row-wise transposed access

Also add --fused-gdn [on|off|auto] CLI flag (mirrors --flash-attn) so
users can control fused GDN independently of auto-detection.

All GATED_DELTA_NET backend-ops tests pass.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* ggml : use SIMD dot products in CPU GDN kernel, couple AR/chunked fused flags

- Replace scalar inner loops with ggml_vec_dot_f32 for SIMD-optimized
  dot products in the CPU fused GDN kernel (delta and attention output)
- Couple fused_gdn_ar and fused_gdn_ch flags in auto-detection: if one
  path lacks device support, disable both to prevent state layout mismatch
  between transposed (fused) and non-transposed (unfused) formats

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* llama : rever fgdn argument changes

* graph : remove GDN state transposes

* vulkan : adapt

* cuda : remove obsolete smem code

---------

Co-authored-by: Paul Flynn <paul@arkavo.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Oliver Simons <osimons@nvidia.com>
2026-03-13 22:12:54 +02:00
Piotr Wilkin (ilintar) 1430c35948
common/parser: gracefully handle undetected tool parser, print error message. (#20286) 2026-03-13 20:56:10 +01:00
ZeroV0LT f17b3be63f
llama : fix pooling assertion crash in chunked GDN detection path (#20468)
* llama : fix pooling assertion crash in chunked GDN detection path

The chunked fused Gated Delta Net detection in sched_reserve() calls
graph_reserve(16*n_seqs, n_seqs, n_outputs, ...) where n_outputs = n_seqs.
This creates a dimension mismatch in build_pooling() for embedding models
with mean/rank pooling: build_inp_mean() creates a tensor with shape
[n_tokens=16*n_seqs, ...] while t_embd is reduced to [n_outputs=n_seqs, ...]
via out_ids, causing ggml_mul_mat to assert on ggml_can_mul_mat(a, b).

Fix: pass n_tokens as n_outputs in the chunked GDN graph reservation,
matching the pattern used by the pp/tg worst-case reservations.

Regression introduced by #20340 (d28961d).
Same class of bug as #12517, fixed by #12545.

* server : add mean pooling tests to embedding test suite

Add test_embedding_pooling_mean and test_embedding_pooling_mean_multiple
to cover the --pooling mean codepath, which was previously untested.

These tests would have caught the regression introduced by #20340 where
build_pooling() crashes with a ggml_mul_mat assertion due to mismatched
dimensions in the chunked GDN detection path.

---------

Co-authored-by: Domenico Crupi <domenico@zerovolt.it>
2026-03-13 20:53:42 +02:00
SoftwareRenderer d7ba99c485
server: reset counter related to kill-switch on client error (#20513)
* server: reset kill-switch on client error

This avoids triggering a server kill switch.

If the client sends a request that exceeds the configured context size, an appropriate HTTP 400 response is provided and no tokens are generated.

However since no tokens are generated, update_slots() increments n_empty_consecutive. If the client sends 3 such messages in a row, the server terminates.

* moved counter reset as per recommendation

* cont : minor

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-03-13 19:58:09 +02:00
rehan-10xengineer fbaa95bc29
ggml-cpu: add RVV vec dot kernels for quantization types (#18859)
* ggml-cpu: add rvv quantize_row_q8_K kernel

Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai>

* ggml-cpu: add rvv vec_dot for iq4_nl, mxfp4, iq2_xxs

Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai>

* ggml-cpu: add rvv vec_dot for iq4_xs, refactor

* ggml-cpu: remove ifunc for rvv vec dot

* ggml-cpu: add vec_dot for iq2_xs, iq3_xxs

Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai>

* ggml-cpu: refactor quants.c

---------

Co-authored-by: taimur-10x <taimur.ahmad@10xengineers.ai>
Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai>
Co-authored-by: Rehan Qasim <rehanbhatti0317@gmail.com>
2026-03-13 17:36:04 +02:00
Adrien Gallouët b5e1212063
ggml : fix typo gmml (#20512)
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-13 14:36:13 +01:00
Daniel Bevenius 8f974d2392
mtmd : rename mtmd_get_audio_bitrate to mtmd_get_audio_sample_rate (#20105)
This commit renames the the function `mtmd_get_audio_bitrate` to
`mtmd_get_audio_sample_rate` to better reflect its purpose.

The motivation for this is that the function currently returns the audio
sample rate, not the bitrate (sample_rate × bit_depth × channels), and
that is how it is used in the code as well.

This is a breaking change, but I believe mtmd is still in
experimental/development phase so it might be alright to simply rename.
2026-03-13 12:30:02 +01:00
Piotr Wilkin (ilintar) 2948e6049a
general: CONTRIBUTING.md - guidelines for quantization schemes (#19762)
* Guidelines for quantization schemes

* Update CONTRIBUTING.md

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* Change required precision from Q8 to FP16/BF16

* Update CONTRIBUTING.md

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* Update CONTRIBUTING.md

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* Update CONTRIBUTING.md

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* Update CONTRIBUTING.md

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* Update CONTRIBUTING.md [no ci]

* Update CONTRIBUTING.md [no ci]

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2026-03-13 12:21:33 +01:00
Georgi Gerganov 73c9eb8ced
metal : fix l2 norm scale (#20493) 2026-03-13 11:43:20 +02:00