Commit Graph

7944 Commits

Author SHA1 Message Date
Leszek Hanusz a0c5c26fb9 Fix calculation of total tokens after undo/redo 2026-02-05 02:33:39 +01:00
Leszek Hanusz 4659a36ffd Add 42px min height to the statistics to avoid flickering height problems + remove unused imports 2026-02-04 18:44:22 +01:00
Leszek Hanusz a574409432 Restore .gitignore 2026-02-04 18:12:45 +01:00
Leszek Hanusz 77dc99cd9a Remove [DONE] check 2026-02-04 18:11:27 +01:00
Leszek Hanusz 031e426005 Run npm run format 2026-02-04 16:31:44 +01:00
Leszek Hanusz 393faf0166 Put completion api service in separate file 2026-02-04 16:29:53 +01:00
Leszek Hanusz 251ba9d72a Put tokenize in a separate file 2026-02-04 15:58:54 +01:00
Leszek Hanusz efd274ab3d chore: update webui build output 2026-02-04 14:25:20 +01:00
Leszek Hanusz ad3b8df38f Remove currentConfig.model 2026-02-04 02:03:59 +01:00
Leszek Hanusz f20b17a087 Remove inputContent var and use tokenize only when needed 2026-02-04 01:23:24 +01:00
Leszek Hanusz 9cf4742adb Fix tokenize with router on 2026-02-04 00:21:56 +01:00
Leszek Hanusz 03077cf297 Merge branch 'master' into notebook 2026-02-03 03:04:31 +01:00
Leszek Hanusz 210dc6a2c0 Running npm run format 2026-02-03 02:27:10 +01:00
Leszek Hanusz 9dc75f2664 Fix npm run check errors 2026-02-03 02:22:32 +01:00
Leszek Hanusz f42d889a47 Fix vertical alignment of Generate tooltip shortcut info 2026-02-03 02:14:28 +01:00
Leszek Hanusz fb2095e815 Show total number of tokens by using tokenizer 2026-02-03 01:50:52 +01:00
lhez 91ea44e89b
opencl: refactor some ops, concat, repeat, tanh and scale (#19226)
* opencl: refactor concat

* opencl: refactor repeat

* opencl: refactor tanh

* opencl: enable fp16 for tanh

* opencl: refactor scale

* opencl: fix unused variables
2026-02-02 15:54:43 -08:00
Leszek Hanusz 3657a8a7ad Implement shortcuts for the notebook page 2026-02-02 23:59:36 +01:00
Leszek Hanusz 7892b259cb Add last undo/redo for notebook page 2026-02-02 22:39:07 +01:00
Leszek Hanusz f041a864ed Use same dialog for server errors on notebook page 2026-02-02 21:29:48 +01:00
Leszek Hanusz 11e3cd81ce Protect window from accidental closure if the notebook is not empty as it is not saved 2026-02-02 21:15:24 +01:00
Sid Mohan 0dfcd3b607
jinja : add missing 'in' test to template engine (#19004) (#19239)
* jinja : add missing 'in' test to template engine (#19004)

The jinja template parser was missing the 'in' test from
global_builtins(), causing templates using reject("in", ...),
select("in", ...), or 'x is in(y)' to fail with
"selectattr: unknown test 'in'".

This broke tool-calling for Qwen3-Coder and any other model
whose chat template uses the 'in' test.

Added test_is_in supporting array, string, and object containment
checks, mirroring the existing 'in' operator logic in runtime.cpp.

Includes test cases for all three containment types plus
reject/select filter usage.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* reuse test_is_in in binary op

---------

Co-authored-by: Sid Mohan <sidmohan0@users.noreply.github.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2026-02-02 21:00:55 +01:00
Xuan-Son Nguyen 07a7412a3b
mtmd: add min/max pixels gguf metadata (#19273) 2026-02-02 20:59:06 +01:00
Leszek Hanusz 301c3fec7e Add generation statistics to notebook page 2026-02-02 18:39:46 +01:00
Aman Gupta 9f682fb640
ggml-cpu: FA split across kv for faster TG (#19209)
* ggml-cpu: split across kv for faster TG

* simplify sinks application

* add ref impl
2026-02-03 01:19:55 +08:00
Matthieu Coudron a3fa035822
server: print actual model name in 'model not found" error (#19117)
Experimenting with AI, my environment gets messy fast and it's not
always easy to know what model my software is trying to load. This helps
with troubleshooting.

before:

Error: {
  code = 400,
  message = "model not found",
  type = "invalid_request_error"
}

After:

Error: {
  code = 400,
  message = "model 'toto' not found",
  type = "invalid_request_error"
}
2026-02-02 16:55:27 +01:00
Leszek Hanusz 8a71126e5b Autoscroll the notebook textarea depending on config parameter 2026-02-02 16:19:53 +01:00
Leszek Hanusz e80ba11778 Fix sidebar behavior same as chat pages 2026-02-02 15:46:12 +01:00
Aman Gupta 15818ac44c
ci: add test-backend-ops test for CPU (#19268) 2026-02-02 22:40:28 +08:00
Leszek Hanusz ff2f0bba4a Remove console logs 2026-02-02 15:06:51 +01:00
Neo Zhang bf38346d13
Remove support for Nvidia & AMD GPU, because the oneAPI plugin for Nvidia & AMD GPU is unavailable: download/installation channels are out of work. (#19246)
User can't build up the software for Nvidia & AMD GPU.
rm the oneMath since it is only used in NV and AMD code path.
2026-02-02 21:06:21 +08:00
Tamar 4d5e972673
sycl: implement GGML_OP_TOP_K (#19242) 2026-02-02 21:05:51 +08:00
Georgi Gerganov 6fdddb4987
metal : support virtual devices (#18919)
* metal : support virtual devices

* cont : manage buffer type context memory

* metal : add events

* cont : implement cpy_tensor_async
2026-02-02 14:29:44 +02:00
Daniel Bevenius 6156ae5111
model-conversion : add debug option to conversion script (#19265)
This commit adds a debug option to the model conversion script to enable
using the Python debugger (pdb) during model conversion.

The motivation for this is that I've found myself adding this a few
times now and it would be quicker to have this flag as an option and a
makefile target/recipe for it.
2026-02-02 11:29:57 +01:00
Johannes Gäßler 59377a6c87
ggml-backend: fix async set/get fallback sync (#19179) 2026-02-02 10:00:05 +01:00
Georgi Gerganov 1239267cc4
authors : update (#19263)
[no ci]
2026-02-02 08:51:25 +02:00
Christian Kastner 7a4ca3cbd9
docs : Minor cleanups (#19252)
* Update old URLs to github.com/ggml-org/

* Bump copyrights
2026-02-02 08:38:55 +02:00
Sascha Rogmann b4d05a3d2f
spec : various improvements ton ngram-map + docs (#19253)
* spec: ngram-map and reasoning chats

* spec: add t_begin and t_accept

* ngram-map : add internal hash map

* docs : update ngram-map, add ngram-mod

* docs : fix ngram-map-k

* docs : differences between implementations
2026-02-02 08:26:58 +02:00
Nikhil Jain 2dc3ce2166
Remove pipeline cache mutexes (#19195)
* Remove mutex for pipeline caches, since they are now per-thread.

* Add comment

* Run clang-format

* Cleanup

* Run CI again

* Run CI once more

* Run clang-format
2026-02-01 18:47:29 -08:00
Leszek Hanusz c9f9863268 Add .agent/ to gitignore
Fix buttons
Fix model loading with router enabled
remove stats for now
lint
2026-02-01 23:20:34 +01:00
Max Krasnyansky 3bc8d2cf23
Bump cmake max version (needed for Windows on Snapdragon builds) (#19188)
* Bump max cmake version (needed for Windows on Snapdragon builds)

* cmake: move max version setting into ggml/CMakeLists
2026-02-01 14:13:38 -08:00
Alexis Williams 8a98ba4582
nix: fix allowUnfreePredicate for packages with multiple licenses (#19237)
The allowUnfreePredicate in pkgsCuda was wrapping p.meta.license in a
list unconditionally. This fails when meta.license is already a list
of licenses, as it creates a nested list and then tries to access
.free and .shortName on the inner list.

Use lib.toList instead, which correctly handles both cases:
- Single license attrset -> wraps in list
- List of licenses -> returns unchanged
2026-02-01 22:10:48 +02:00
Neo Zhang 2634ed207a
create test.sh to enhance the parameters for testing, update the guide, rm useless script (#19243) 2026-02-01 18:24:00 +08:00
Leszek Hanusz 3af9b34aa2 Refine Notebook UI: improved layout, added stats and model info 2026-01-31 23:59:45 +01:00
Leszek Hanusz 6d96745375 Implement Notebook interface 2026-01-31 22:14:28 +01:00
Matthieu Coudron 41ea26144e
nix: fix nix develop .#python-scripts (#19218)
Without this I get:

> * Getting build dependencies for wheel...
> * Building wheel...
> Successfully built gguf-0.17.1-py3-none-any.whl
> Finished creating a wheel...
> Finished executing pypaBuildPhase
> Running phase: pythonRuntimeDepsCheckHook
> Executing pythonRuntimeDepsCheck
> Checking runtime dependencies for gguf-0.17.1-py3-none-any.whl
>   - requests not installed
For full logs, run:
    nix log /nix/store/x0c4a251l68bvdgang9d8v2fsmqay8a4-python3.12-gguf-0.0.0.drv

I changed a bit the style to make it more terse ~> more elegant in my
opinion.
2026-01-31 18:01:46 +02:00
nullname 89f10baad5
ggml-hexagon: flash-attention and reduce-sum optimizations (#19141)
* wip

* ggml-hexagon: add vectorized dot product function for FP32 and FP16 accumulation

* ggml-hexagon: optimize dot product functions for FP16 and FP32 with new vectorized implementations

* wip

* ggml-hexagon: optimize hvx_vec_dump_f32_n and hvx_vec_reduce_sum_qf32x2 functions for improved performance

* ggml-hexagon: refactor dot product functions to use a common loading function for improved readability

* optimize vector dot product functions to use unified reduction for improved performance

* wip

* ggml-hexagon: add vectorized dot product function for FP32 and FP16 accumulation

* ggml-hexagon: optimize dot product functions for FP16 and FP32 with new vectorized implementations

* wip

* ggml-hexagon: optimize hvx_vec_dump_f32_n and hvx_vec_reduce_sum_qf32x2 functions for improved performance

* ggml-hexagon: refactor dot product functions to use a common loading function for improved readability

* optimize vector dot product functions to use unified reduction for improved performance

* hexagon: optimize reduce-sum for v75+

* hexagon: always keep row_sums in sf/fp32

* ggml-hexagon: enhance directory checks for HEXAGON_SDK_ROOT and HEXAGON_TOOLS_ROOT

* fix compiling error after rebase

---------

Co-authored-by: Max Krasnyansky <maxk@qti.qualcomm.com>
2026-01-30 21:14:20 -08:00
EugeoSynthesisThirtyTwo 3dd95914d0
quantize: add option --tensor-type-file to llama-quantize (#18572)
* add option --tensor-type-file to llama-quantize, but it raises an error.

* add error message when file not found

* quantize: update help menu, fix CI

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

---------

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Aaron Teo <aaron.teo1@ibm.com>
2026-01-31 11:39:21 +08:00
tc-mb ec6c7421e4
mtmd: support MiniCPM-o 4.5(vision only) (#19211)
Signed-off-by: tc-mb <caitianchi@modelbest.cn>
2026-01-30 23:19:30 +01:00
Daniele Pinna 1488339138
lookup, lookahead: fix crash when n_ctx not specified (#18729)
* lookup, lookahead: fix crash when n_ctx not specified

Since PR #16653 (Dec 15, 2025), the default n_ctx is 0 to enable automatic
GPU memory fitting. This causes llama-lookup and llama-lookahead to crash
when run without explicit -c flag:

    GGML_ASSERT(batch.seq_id[batch.n_tokens] && "llama_batch size exceeded")

Root cause: Both examples use params.n_ctx directly for batch initialization,
but params.n_ctx remains 0 even after the context is properly initialized
to n_ctx_train internally.

Bug history:
- Nov 2023: lookahead.cpp created (PR #4207) with params.n_ctx pattern
- Dec 2023: lookup.cpp created (PR #4484) with same pattern
- Nov 2024: default n_ctx changed to 4096 (PR #10136) - bug dormant
- Dec 2025: default n_ctx changed to 0 (PR #16653) - bug activated

The bug was dormant for 2+ years because params.n_ctx defaulted to 512,
then 4096. PR #16653 changed it to 0 for GPU auto-fitting, triggering
the crash.

Fix: Use llama_n_ctx(ctx) to get the actual runtime context size, matching
the pattern already used elsewhere in lookup.cpp (line 72) and in
speculative.cpp/speculative-simple.cpp.

Tested: llama-lookup now works without -c flag (12.5% acceptance on
Gemma-3-1B).

Note: llama-lookahead has a separate pre-existing issue with sequence
initialization (n_seq_max=1 vs W+G+1 needed) that is unrelated to this fix.

* lookahead: fix n_seq_max and kv_unified configuration

Lookahead decoding requires:
- W + G + 1 = 31 sequences for parallel Jacobi decoding
- Unified KV cache for coupled sequences in batch splitting

These requirements were broken after PR #14482 changed validation logic.

Consolidates fix from PR #18730 per maintainer request.

Commit message drafted with Claude.
2026-01-30 22:10:24 +02:00