llama.cpp

Commit Graph

Author	SHA1	Message	Date
Leszek Hanusz	a0c5c26fb9	Fix calculation of total tokens after undo/redo	2026-02-05 02:33:39 +01:00
Leszek Hanusz	4659a36ffd	Add 42px min height to the statistics to avoid flickering height problems + remove unused imports	2026-02-04 18:44:22 +01:00
Leszek Hanusz	a574409432	Restore .gitignore	2026-02-04 18:12:45 +01:00
Leszek Hanusz	77dc99cd9a	Remove [DONE] check	2026-02-04 18:11:27 +01:00
Leszek Hanusz	031e426005	Run npm run format	2026-02-04 16:31:44 +01:00
Leszek Hanusz	393faf0166	Put completion api service in separate file	2026-02-04 16:29:53 +01:00
Leszek Hanusz	251ba9d72a	Put tokenize in a separate file	2026-02-04 15:58:54 +01:00
Leszek Hanusz	efd274ab3d	chore: update webui build output	2026-02-04 14:25:20 +01:00
Leszek Hanusz	ad3b8df38f	Remove currentConfig.model	2026-02-04 02:03:59 +01:00
Leszek Hanusz	f20b17a087	Remove inputContent var and use tokenize only when needed	2026-02-04 01:23:24 +01:00
Leszek Hanusz	9cf4742adb	Fix tokenize with router on	2026-02-04 00:21:56 +01:00
Leszek Hanusz	03077cf297	Merge branch 'master' into notebook	2026-02-03 03:04:31 +01:00
Leszek Hanusz	210dc6a2c0	Running npm run format	2026-02-03 02:27:10 +01:00
Leszek Hanusz	9dc75f2664	Fix npm run check errors	2026-02-03 02:22:32 +01:00
Leszek Hanusz	f42d889a47	Fix vertical alignment of Generate tooltip shortcut info	2026-02-03 02:14:28 +01:00
Leszek Hanusz	fb2095e815	Show total number of tokens by using tokenizer	2026-02-03 01:50:52 +01:00
lhez	91ea44e89b	opencl: refactor some ops, concat, repeat, tanh and scale (#19226 ) * opencl: refactor concat * opencl: refactor repeat * opencl: refactor tanh * opencl: enable fp16 for tanh * opencl: refactor scale * opencl: fix unused variables	2026-02-02 15:54:43 -08:00
Leszek Hanusz	3657a8a7ad	Implement shortcuts for the notebook page	2026-02-02 23:59:36 +01:00
Leszek Hanusz	7892b259cb	Add last undo/redo for notebook page	2026-02-02 22:39:07 +01:00
Leszek Hanusz	f041a864ed	Use same dialog for server errors on notebook page	2026-02-02 21:29:48 +01:00
Leszek Hanusz	11e3cd81ce	Protect window from accidental closure if the notebook is not empty as it is not saved	2026-02-02 21:15:24 +01:00
Sid Mohan	0dfcd3b607	jinja : add missing 'in' test to template engine (#19004 ) (#19239 ) * jinja : add missing 'in' test to template engine (#19004) The jinja template parser was missing the 'in' test from global_builtins(), causing templates using reject("in", ...), select("in", ...), or 'x is in(y)' to fail with "selectattr: unknown test 'in'". This broke tool-calling for Qwen3-Coder and any other model whose chat template uses the 'in' test. Added test_is_in supporting array, string, and object containment checks, mirroring the existing 'in' operator logic in runtime.cpp. Includes test cases for all three containment types plus reject/select filter usage. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * reuse test_is_in in binary op --------- Co-authored-by: Sid Mohan <sidmohan0@users.noreply.github.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2026-02-02 21:00:55 +01:00
Xuan-Son Nguyen	07a7412a3b	mtmd: add min/max pixels gguf metadata (#19273 )	2026-02-02 20:59:06 +01:00
Leszek Hanusz	301c3fec7e	Add generation statistics to notebook page	2026-02-02 18:39:46 +01:00
Aman Gupta	9f682fb640	ggml-cpu: FA split across kv for faster TG (#19209 ) * ggml-cpu: split across kv for faster TG * simplify sinks application * add ref impl	2026-02-03 01:19:55 +08:00
Matthieu Coudron	a3fa035822	server: print actual model name in 'model not found" error (#19117 ) Experimenting with AI, my environment gets messy fast and it's not always easy to know what model my software is trying to load. This helps with troubleshooting. before: Error: { code = 400, message = "model not found", type = "invalid_request_error" } After: Error: { code = 400, message = "model 'toto' not found", type = "invalid_request_error" }	2026-02-02 16:55:27 +01:00
Leszek Hanusz	8a71126e5b	Autoscroll the notebook textarea depending on config parameter	2026-02-02 16:19:53 +01:00
Leszek Hanusz	e80ba11778	Fix sidebar behavior same as chat pages	2026-02-02 15:46:12 +01:00
Aman Gupta	15818ac44c	ci: add test-backend-ops test for CPU (#19268 )	2026-02-02 22:40:28 +08:00
Leszek Hanusz	ff2f0bba4a	Remove console logs	2026-02-02 15:06:51 +01:00
Neo Zhang	bf38346d13	Remove support for Nvidia & AMD GPU, because the oneAPI plugin for Nvidia & AMD GPU is unavailable: download/installation channels are out of work. (#19246 ) User can't build up the software for Nvidia & AMD GPU. rm the oneMath since it is only used in NV and AMD code path.	2026-02-02 21:06:21 +08:00
Tamar	4d5e972673	sycl: implement GGML_OP_TOP_K (#19242 )	2026-02-02 21:05:51 +08:00
Georgi Gerganov	6fdddb4987	metal : support virtual devices (#18919 ) * metal : support virtual devices * cont : manage buffer type context memory * metal : add events * cont : implement cpy_tensor_async	2026-02-02 14:29:44 +02:00
Daniel Bevenius	6156ae5111	model-conversion : add debug option to conversion script (#19265 ) This commit adds a debug option to the model conversion script to enable using the Python debugger (pdb) during model conversion. The motivation for this is that I've found myself adding this a few times now and it would be quicker to have this flag as an option and a makefile target/recipe for it.	2026-02-02 11:29:57 +01:00
Johannes Gäßler	59377a6c87	ggml-backend: fix async set/get fallback sync (#19179 )	2026-02-02 10:00:05 +01:00
Georgi Gerganov	1239267cc4	authors : update (#19263 ) [no ci]	2026-02-02 08:51:25 +02:00
Christian Kastner	7a4ca3cbd9	docs : Minor cleanups (#19252 ) * Update old URLs to github.com/ggml-org/ * Bump copyrights	2026-02-02 08:38:55 +02:00
Sascha Rogmann	b4d05a3d2f	spec : various improvements ton ngram-map + docs (#19253 ) * spec: ngram-map and reasoning chats * spec: add t_begin and t_accept * ngram-map : add internal hash map * docs : update ngram-map, add ngram-mod * docs : fix ngram-map-k * docs : differences between implementations	2026-02-02 08:26:58 +02:00
Nikhil Jain	2dc3ce2166	Remove pipeline cache mutexes (#19195 ) * Remove mutex for pipeline caches, since they are now per-thread. * Add comment * Run clang-format * Cleanup * Run CI again * Run CI once more * Run clang-format	2026-02-01 18:47:29 -08:00
Leszek Hanusz	c9f9863268	Add .agent/ to gitignore Fix buttons Fix model loading with router enabled remove stats for now lint	2026-02-01 23:20:34 +01:00
Max Krasnyansky	3bc8d2cf23	Bump cmake max version (needed for Windows on Snapdragon builds) (#19188 ) * Bump max cmake version (needed for Windows on Snapdragon builds) * cmake: move max version setting into ggml/CMakeLists	2026-02-01 14:13:38 -08:00
Alexis Williams	8a98ba4582	nix: fix allowUnfreePredicate for packages with multiple licenses (#19237 ) The allowUnfreePredicate in pkgsCuda was wrapping p.meta.license in a list unconditionally. This fails when meta.license is already a list of licenses, as it creates a nested list and then tries to access .free and .shortName on the inner list. Use lib.toList instead, which correctly handles both cases: - Single license attrset -> wraps in list - List of licenses -> returns unchanged	2026-02-01 22:10:48 +02:00
Neo Zhang	2634ed207a	create test.sh to enhance the parameters for testing, update the guide, rm useless script (#19243 )	2026-02-01 18:24:00 +08:00
Leszek Hanusz	3af9b34aa2	Refine Notebook UI: improved layout, added stats and model info	2026-01-31 23:59:45 +01:00
Leszek Hanusz	6d96745375	Implement Notebook interface	2026-01-31 22:14:28 +01:00
Matthieu Coudron	41ea26144e	nix: fix nix develop .#python-scripts (#19218 ) Without this I get: > * Getting build dependencies for wheel... > * Building wheel... > Successfully built gguf-0.17.1-py3-none-any.whl > Finished creating a wheel... > Finished executing pypaBuildPhase > Running phase: pythonRuntimeDepsCheckHook > Executing pythonRuntimeDepsCheck > Checking runtime dependencies for gguf-0.17.1-py3-none-any.whl > - requests not installed For full logs, run: nix log /nix/store/x0c4a251l68bvdgang9d8v2fsmqay8a4-python3.12-gguf-0.0.0.drv I changed a bit the style to make it more terse ~> more elegant in my opinion.	2026-01-31 18:01:46 +02:00
nullname	89f10baad5	ggml-hexagon: flash-attention and reduce-sum optimizations (#19141 ) * wip * ggml-hexagon: add vectorized dot product function for FP32 and FP16 accumulation * ggml-hexagon: optimize dot product functions for FP16 and FP32 with new vectorized implementations * wip * ggml-hexagon: optimize hvx_vec_dump_f32_n and hvx_vec_reduce_sum_qf32x2 functions for improved performance * ggml-hexagon: refactor dot product functions to use a common loading function for improved readability * optimize vector dot product functions to use unified reduction for improved performance * wip * ggml-hexagon: add vectorized dot product function for FP32 and FP16 accumulation * ggml-hexagon: optimize dot product functions for FP16 and FP32 with new vectorized implementations * wip * ggml-hexagon: optimize hvx_vec_dump_f32_n and hvx_vec_reduce_sum_qf32x2 functions for improved performance * ggml-hexagon: refactor dot product functions to use a common loading function for improved readability * optimize vector dot product functions to use unified reduction for improved performance * hexagon: optimize reduce-sum for v75+ * hexagon: always keep row_sums in sf/fp32 * ggml-hexagon: enhance directory checks for HEXAGON_SDK_ROOT and HEXAGON_TOOLS_ROOT * fix compiling error after rebase --------- Co-authored-by: Max Krasnyansky <maxk@qti.qualcomm.com>	2026-01-30 21:14:20 -08:00
EugeoSynthesisThirtyTwo	3dd95914d0	quantize: add option --tensor-type-file to llama-quantize (#18572 ) * add option --tensor-type-file to llama-quantize, but it raises an error. * add error message when file not found * quantize: update help menu, fix CI Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> --------- Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> Co-authored-by: Your Name <you@example.com> Co-authored-by: Aaron Teo <aaron.teo1@ibm.com>	2026-01-31 11:39:21 +08:00
tc-mb	ec6c7421e4	mtmd: support MiniCPM-o 4.5(vision only) (#19211 ) Signed-off-by: tc-mb <caitianchi@modelbest.cn>	2026-01-30 23:19:30 +01:00
Daniele Pinna	1488339138	lookup, lookahead: fix crash when n_ctx not specified (#18729 ) * lookup, lookahead: fix crash when n_ctx not specified Since PR #16653 (Dec 15, 2025), the default n_ctx is 0 to enable automatic GPU memory fitting. This causes llama-lookup and llama-lookahead to crash when run without explicit -c flag: GGML_ASSERT(batch.seq_id[batch.n_tokens] && "llama_batch size exceeded") Root cause: Both examples use params.n_ctx directly for batch initialization, but params.n_ctx remains 0 even after the context is properly initialized to n_ctx_train internally. Bug history: - Nov 2023: lookahead.cpp created (PR #4207) with params.n_ctx pattern - Dec 2023: lookup.cpp created (PR #4484) with same pattern - Nov 2024: default n_ctx changed to 4096 (PR #10136) - bug dormant - Dec 2025: default n_ctx changed to 0 (PR #16653) - bug activated The bug was dormant for 2+ years because params.n_ctx defaulted to 512, then 4096. PR #16653 changed it to 0 for GPU auto-fitting, triggering the crash. Fix: Use llama_n_ctx(ctx) to get the actual runtime context size, matching the pattern already used elsewhere in lookup.cpp (line 72) and in speculative.cpp/speculative-simple.cpp. Tested: llama-lookup now works without -c flag (12.5% acceptance on Gemma-3-1B). Note: llama-lookahead has a separate pre-existing issue with sequence initialization (n_seq_max=1 vs W+G+1 needed) that is unrelated to this fix. * lookahead: fix n_seq_max and kv_unified configuration Lookahead decoding requires: - W + G + 1 = 31 sequences for parallel Jacobi decoding - Unified KV cache for coupled sequences in batch splitting These requirements were broken after PR #14482 changed validation logic. Consolidates fix from PR #18730 per maintainer request. Commit message drafted with Claude.	2026-01-30 22:10:24 +02:00

1 2 3 4 5 ...

7944 Commits All Branches Search

7944 Commits

All Branches