Commit Graph

8411 Commits

Author SHA1 Message Date
Imad Saddik c6c63786c2 chore: update webui build output 2026-03-15 17:49:34 +00:00
Imad Saddik de04a9b0e6 style: reset the width of the processing info div 2026-03-15 17:48:16 +00:00
Imad Saddik 715ba4ee85 chore: update webui build output 2026-03-15 17:40:36 +00:00
Imad Saddik fa7d3a96c5 fix: keep the container spanning the whole width to fix scroll bar issue 2026-03-15 17:39:19 +00:00
Imad Saddik c399ec3c46 chore: update webui build output 2026-03-15 17:36:11 +00:00
Imad Saddik 72c1928dc9 refactor: move widthClasses.class to the container div 2026-03-15 17:34:31 +00:00
Imad Saddik 0cd953eea2 chore: update webui build output 2026-03-15 17:13:14 +00:00
Imad Saddik 6074619ba4 style: restore class for checkbox labels 2026-03-15 17:11:58 +00:00
Imad Saddik 297abf8450 chore: update webui build output 2026-03-15 17:09:59 +00:00
Imad Saddik d4034eff07 fix: update chatWidthClasses to use autoChatWidth configuration 2026-03-15 17:08:43 +00:00
Imad Saddik b73209694d chore: update webui build output 2026-03-15 17:03:58 +00:00
Imad Saddik 2630c27754 refactor: simplify chatWidthClasses getter logic and remove widthClasses.class 2026-03-15 17:02:41 +00:00
Imad Saddik 1a6f21f25c chore: revert package-lock.json to match master 2026-03-15 16:56:41 +00:00
Imad Saddik 95be04617e chore: update webui build output 2026-03-15 16:56:08 +00:00
Imad Saddik 2836834801 refactor: remove anything related to the custom chat width setting 2026-03-15 16:54:43 +00:00
Imad Saddik 20a8227933 chore: update webui build output 2026-03-15 16:44:35 +00:00
Imad Saddik 89647d5daf chore: downgrade @lucide/svelte version and remove custom chat width component 2026-03-15 16:43:18 +00:00
Imad Saddik c55533a706 chore: update webui build output 2026-03-14 09:00:23 +00:00
Imad Saddik 29ede762c4 refactor: don't reset custom chat width when the auto width is checked 2026-03-14 08:59:02 +00:00
Imad Saddik e8eccf9b35 feat: update chatWidthClasses to prioritize auto chat width 2026-03-14 08:57:25 +00:00
Imad Saddik 23758f3ba8 feat: add syncable parameters for auto and custom chat width 2026-03-14 08:55:56 +00:00
Imad Saddik b7851305df chore: update webui build output 2026-03-14 08:29:24 +00:00
Imad Saddik e2a6be14e7 fix: pass style to ChatMessageUser 2026-03-14 08:09:33 +00:00
Imad Saddik bcc95c98cb refactor: remove chatWidthClasses from ChatForm 2026-03-14 08:05:20 +00:00
Imad Saddik 16fcb29197 fix: use widthClasses in ChatScreenForm 2026-03-14 08:03:48 +00:00
Imad Saddik 5a721b5678 chore: update webui build output 2026-03-14 06:57:42 +00:00
Imad Saddik ee944af476 style: fix indentation and formatting in ChatForm.svelte 2026-03-14 06:54:26 +00:00
Imad Saddik 8f9571a5c2 refactor: use derived on chatWidthClasses for consistency 2026-03-14 06:53:13 +00:00
Imad Saddik 0306577300 chore: update webui build output 2026-03-14 06:48:49 +00:00
Imad Saddik c3cb3fcfcd refactor: remove unused chat width parameters from syncable parameters 2026-03-14 06:47:28 +00:00
Imad Saddik 19986697e3 chore: update webui build output 2026-03-14 06:46:26 +00:00
Imad Saddik 8cef196854 style: fix formatting 2026-03-14 06:44:43 +00:00
Imad Saddik 4561f25021 refactor: call chatWidthClasses once and reuse it everywhere 2026-03-14 06:43:19 +00:00
Imad Saddik 165234e722 fix: correct typo in disabled message for automatic width 2026-03-14 06:37:06 +00:00
Imad Saddik b9545a1021 chore: update webui build output 2026-03-14 06:33:38 +00:00
Imad Saddik 5dd9b7d888 Merge branch 'master' into feat/change_chat_screen_width 2026-03-14 06:32:15 +00:00
Adrien Gallouët 77e20cc107
vendor : update cpp-httplib to 0.37.2 (#20484)
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-14 06:51:02 +01:00
Rail Chabdarov 5a32a9b8a5
Fix data race in CUDA's "cpy" kernel (influences GGML's DUP, CONT operations). (#20507)
* Fix datarace in CUDA's "cpy" kernel.

* Remove extra barrier by using more of shared memory.
2026-03-14 13:19:44 +08:00
lhez 3b439504ba
opencl: fix l2_norm (#20480) 2026-03-13 22:18:52 -07:00
Adrien Gallouët 463b6a963c
tools : enable kvu in perplexity for hellaswag, winogrande, multiple-choice (#19954)
llama-perplexity -hf unsloth/Qwen3-0.6B-GGUF:Q4_K_M -f winogrande-debiased-eval.csv --winogrande

    winogrande_score : tokenizing selected tasks
    winogrande_score : calculating winogrande score over selected tasks.
    split_equal: sequential split is not supported when there are coupled sequences in the input batch (you may need to use the -kvu flag)
    decode: failed to find a memory slot for batch of size 46
    failed to decode the batch, n_batch = 2048, ret = 1
    winogrande_score: llama_decode() failed

same for hellaswag:

    split_equal: sequential split is not supported when there are coupled sequences in the input batch (you may need to use the -kvu flag)
    decode: failed to find a memory slot for batch of size 99
    failed to decode the batch, n_batch = 2048, ret = 1
    hellaswag_score: llama_decode() failed

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-13 21:25:57 +01:00
Georgi Gerganov e30f1fdf74
graph : remove redundant GDN state transposes (#20443)
* ggml : transpose fused GDN state access for coalesced memory reads (#20436)

The fused Gated Delta Net kernel accessed the [S_v, S_v] state matrix
column-wise on row-major storage, causing strided reads (stride S_v =
128 floats = 512 bytes) that waste GPU cache bandwidth. This produced a
39% regression on Qwen3.5-9B (Metal, M4 Max) compared to the unfused
path.

Transpose the state indexing so threads read contiguously:
- Metal: s_ptr[is*S_v] -> s_ptr[is] (stride 1 vs S_v)
- CUDA:  curr_state[i*S_v+col] -> curr_state[col*S_v+i] (coalesced)
- CPU:   restructured loops for row-wise transposed access

Also add --fused-gdn [on|off|auto] CLI flag (mirrors --flash-attn) so
users can control fused GDN independently of auto-detection.

All GATED_DELTA_NET backend-ops tests pass.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* ggml : use SIMD dot products in CPU GDN kernel, couple AR/chunked fused flags

- Replace scalar inner loops with ggml_vec_dot_f32 for SIMD-optimized
  dot products in the CPU fused GDN kernel (delta and attention output)
- Couple fused_gdn_ar and fused_gdn_ch flags in auto-detection: if one
  path lacks device support, disable both to prevent state layout mismatch
  between transposed (fused) and non-transposed (unfused) formats

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* llama : rever fgdn argument changes

* graph : remove GDN state transposes

* vulkan : adapt

* cuda : remove obsolete smem code

---------

Co-authored-by: Paul Flynn <paul@arkavo.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Oliver Simons <osimons@nvidia.com>
2026-03-13 22:12:54 +02:00
Piotr Wilkin (ilintar) 1430c35948
common/parser: gracefully handle undetected tool parser, print error message. (#20286) 2026-03-13 20:56:10 +01:00
ZeroV0LT f17b3be63f
llama : fix pooling assertion crash in chunked GDN detection path (#20468)
* llama : fix pooling assertion crash in chunked GDN detection path

The chunked fused Gated Delta Net detection in sched_reserve() calls
graph_reserve(16*n_seqs, n_seqs, n_outputs, ...) where n_outputs = n_seqs.
This creates a dimension mismatch in build_pooling() for embedding models
with mean/rank pooling: build_inp_mean() creates a tensor with shape
[n_tokens=16*n_seqs, ...] while t_embd is reduced to [n_outputs=n_seqs, ...]
via out_ids, causing ggml_mul_mat to assert on ggml_can_mul_mat(a, b).

Fix: pass n_tokens as n_outputs in the chunked GDN graph reservation,
matching the pattern used by the pp/tg worst-case reservations.

Regression introduced by #20340 (d28961d).
Same class of bug as #12517, fixed by #12545.

* server : add mean pooling tests to embedding test suite

Add test_embedding_pooling_mean and test_embedding_pooling_mean_multiple
to cover the --pooling mean codepath, which was previously untested.

These tests would have caught the regression introduced by #20340 where
build_pooling() crashes with a ggml_mul_mat assertion due to mismatched
dimensions in the chunked GDN detection path.

---------

Co-authored-by: Domenico Crupi <domenico@zerovolt.it>
2026-03-13 20:53:42 +02:00
SoftwareRenderer d7ba99c485
server: reset counter related to kill-switch on client error (#20513)
* server: reset kill-switch on client error

This avoids triggering a server kill switch.

If the client sends a request that exceeds the configured context size, an appropriate HTTP 400 response is provided and no tokens are generated.

However since no tokens are generated, update_slots() increments n_empty_consecutive. If the client sends 3 such messages in a row, the server terminates.

* moved counter reset as per recommendation

* cont : minor

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-03-13 19:58:09 +02:00
rehan-10xengineer fbaa95bc29
ggml-cpu: add RVV vec dot kernels for quantization types (#18859)
* ggml-cpu: add rvv quantize_row_q8_K kernel

Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai>

* ggml-cpu: add rvv vec_dot for iq4_nl, mxfp4, iq2_xxs

Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai>

* ggml-cpu: add rvv vec_dot for iq4_xs, refactor

* ggml-cpu: remove ifunc for rvv vec dot

* ggml-cpu: add vec_dot for iq2_xs, iq3_xxs

Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai>

* ggml-cpu: refactor quants.c

---------

Co-authored-by: taimur-10x <taimur.ahmad@10xengineers.ai>
Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai>
Co-authored-by: Rehan Qasim <rehanbhatti0317@gmail.com>
2026-03-13 17:36:04 +02:00
Adrien Gallouët b5e1212063
ggml : fix typo gmml (#20512)
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-13 14:36:13 +01:00
Daniel Bevenius 8f974d2392
mtmd : rename mtmd_get_audio_bitrate to mtmd_get_audio_sample_rate (#20105)
This commit renames the the function `mtmd_get_audio_bitrate` to
`mtmd_get_audio_sample_rate` to better reflect its purpose.

The motivation for this is that the function currently returns the audio
sample rate, not the bitrate (sample_rate × bit_depth × channels), and
that is how it is used in the code as well.

This is a breaking change, but I believe mtmd is still in
experimental/development phase so it might be alright to simply rename.
2026-03-13 12:30:02 +01:00
Piotr Wilkin (ilintar) 2948e6049a
general: CONTRIBUTING.md - guidelines for quantization schemes (#19762)
* Guidelines for quantization schemes

* Update CONTRIBUTING.md

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* Change required precision from Q8 to FP16/BF16

* Update CONTRIBUTING.md

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* Update CONTRIBUTING.md

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* Update CONTRIBUTING.md

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* Update CONTRIBUTING.md

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* Update CONTRIBUTING.md [no ci]

* Update CONTRIBUTING.md [no ci]

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2026-03-13 12:21:33 +01:00
Georgi Gerganov 73c9eb8ced
metal : fix l2 norm scale (#20493) 2026-03-13 11:43:20 +02:00
Daniel Bevenius 983df142a9
convert : fix/suppress pyright errors (#20442)
* convert : fix/suppress pyright errors

This commit fixes the pyright errors that are generated by pyright for
convert_hf_to_gguf.py.

The motivation for this is that running this locally generates errors
that CI does not, and it can be difficult to spot new errors. One use
case is when working on new models which cannot be run in CI due to
privacy. Having the ability to run pyright locally is would be helpful
in this cases.

In the linked issue there is the mention of switching to `ty` which I
don't know anything about but in the meantime I would appreciate if we
could suppress these errors for now, and later perhaps revert this
commit.

With this change there are no errors but there are 4 informations
messages if the `mistral_common` package is installed. The
`--level error` flag can be used to suppress them.

Resolves: https://github.com/ggml-org/llama.cpp/issues/20417
2026-03-13 06:00:52 +01:00