Commit Graph

626 Commits

Author SHA1 Message Date
Xuan Son Nguyen e858b7a0a3 add minimal caps system 2026-01-02 16:28:04 +01:00
Xuan Son Nguyen 0f9f986ace test: add --output 2026-01-02 11:33:42 +01:00
Xuan Son Nguyen a66e4a4f5d make output a bit cleaner 2026-01-01 23:07:45 +01:00
Xuan Son Nguyen b23b5e3c01 make testing more flexible 2026-01-01 23:02:30 +01:00
Xuan Son Nguyen d34efd9626 rm type inference 2025-12-31 11:43:53 +01:00
Xuan Son Nguyen cbb37dd4cd improve function args handling 2025-12-31 11:29:40 +01:00
Xuan Son Nguyen 1b213ae5e7 add placeholder for tojson 2025-12-30 21:52:47 +01:00
Xuan Son Nguyen 4479c382ce demo: type inferrence 2025-12-30 17:26:23 +01:00
Xuan Son Nguyen 9c0fa6f810 rm workarounds 2025-12-30 16:07:23 +01:00
Xuan Son Nguyen 9e9a70f72f more fixes 2025-12-29 15:07:18 +01:00
Xuan Son Nguyen 026730e8e3 more fix, more tests 2025-12-29 12:53:31 +01:00
Xuan Son Nguyen 1cf25734a9 more tests 2025-12-29 10:53:32 +01:00
Xuan Son Nguyen 2a31c9a30c a lot of fixes 2025-12-29 00:38:29 +01:00
Xuan Son Nguyen 1784a57e7b impl global_from_json 2025-12-28 23:15:48 +01:00
Xuan Son Nguyen adad34f64d add filter_statement 2025-12-28 22:02:22 +01:00
Xuan Son Nguyen 9a8a45ff3b mostly works 2025-12-28 21:32:55 +01:00
Xuan Son Nguyen 45df0c91e7 testing more templates 2025-12-28 19:50:09 +01:00
Xuan Son Nguyen db09a7468d fix negate test 2025-12-28 19:07:01 +01:00
Xuan Son Nguyen acb0effa25 allow print source on exception 2025-12-28 18:45:41 +01:00
Xuan Son Nguyen 7f17608ea4 use shared_ptr for values 2025-12-28 17:46:25 +01:00
Xuan Son Nguyen 4331e9c8e9 keyword arguments and slicing array 2025-12-28 17:23:29 +01:00
Xuan Son Nguyen 45c194622e support binded functions 2025-12-28 15:33:14 +01:00
Xuan Son Nguyen 4ca114b095 track input string even after transformations 2025-12-28 12:48:35 +01:00
Xuan Son Nguyen 81310d29c1 render gemma tmpl ok 2025-12-28 12:04:23 +01:00
Xuan Son Nguyen 10835f2720 eval with is_user_input 2025-12-27 23:25:20 +01:00
Xuan Son Nguyen da7bbe5813 wip 2025-12-27 22:25:19 +01:00
Xuan Son Nguyen 15b3dbab05 add string builtins 2025-12-27 21:52:50 +01:00
Xuan Son Nguyen d8ef00e610 bin ops works! 2025-12-27 20:16:46 +01:00
Xuan Son Nguyen 8cea1ed6b0 parser ok 2025-12-27 12:55:01 +01:00
Xuan Son Nguyen 15b7c50e95 lexer 2025-12-25 21:08:51 +01:00
Jeff Bolz e3b35ddf1c
vulkan: Extend rope fusions to allow mrope (#18264)
Extend the test-backend-ops tests as well.
2025-12-22 11:03:13 -06:00
Johannes Gäßler 147a521636
tool/ex/tests: consistently free ctx, then model (#18168) 2025-12-22 11:00:37 +01:00
Jeff Bolz fd05c51cec
vulkan: fix im2col overflowing maxworkgroupcount (#18180) 2025-12-21 10:32:58 +01:00
Jeff Bolz b365c3ff01
vulkan/cuda: fix topk_moe with exp_probs_b (#18071)
I updated test_topk_moe to more closely match llm_graph_context::build_moe_ffn
and added coverage for exp_probs_b and some other missing combinations. This
exposed a bug in both CUDA and Vulkan backends where they were assuming the
input to argsort and the input to get_rows are the same. I'd like to optimize
this graph in another change, but for now just get it functional.

CUDA also had a bug where it got n_experts from the wrong place, leading to
GGML_ASSERT failures in some of the new tests.
2025-12-21 10:27:34 +01:00
Jeff Bolz 52ab19df63
tests: Avoid floating point precision false positives in SUM (#17471)
* tests: Avoid floating point precision false positives in SUM

* also apply to test_mean
2025-12-20 13:46:46 -06:00
Jeff Bolz 5182dd64cd
test-backend-ops: improve msvc build time (#18209) 2025-12-20 13:45:45 -06:00
Xuan-Son Nguyen 9e39a1e6a9
server: support load model on startup, support preset-only options (#18206)
* server: support autoload model, support preset-only options

* add docs

* load-on-startup

* fix

* Update common/arg.cpp

Co-authored-by: Pascal <admin@serveurperso.com>

---------

Co-authored-by: Pascal <admin@serveurperso.com>
2025-12-20 09:25:27 +01:00
Pascal 14931a826e
arg: fix order to use short form before long form (#18196)
* arg: fix order to use short form before long form

* arg: update doc

* arg: update test-arg-parser

* arg: address review feedback from ngxson

simplified to check first.length() <= last.length() only
fixed: --sampler-seq, --rerank, --draft ordering
note: middle positions in 3+ arg sets are not verified

* arg: update doc
2025-12-19 18:01:56 +01:00
Xuan-Son Nguyen 8ea958d4d9
model : add ASR support for LFM2-Audio-1.5B (conformer) (#18106)
* ASR with LFM2-Audio-1.5B

* Set rope_theta

* Fix comment

* Remove rope_theta setting

* Address PR feedback

* rename functions to conformer

* remove some redundant ggml_cont

* fix missing tensor

* add prefix "a." for conv tensors

* remove redundant reshape

* clean up

* add test model

---------

Co-authored-by: Tarek Dakhran <tarek@liquid.ai>
2025-12-19 00:18:01 +01:00
Aldehir Rojas c05aa69f32
common : add nemotron 3 parsing (#18077)
* common : expose json-schema functionality to extract type info

* common : fix peg parser negation during needs_more_input

* common : add some defensive measures in constructed peg parser

* common : add nemotron nano 3 support

* common : add nemotron nano 3 tests

* remove debug line
2025-12-16 04:05:23 -06:00
ssweens 4529c660c8
kv-cache: Fix state restore fragmented cache (#17982)
* kv-cache : fix state restore with fragmented cache (#17527)

Change find_slot to allow non-contiguous allocation during state restore. Fixes 'failed to find available cells in kv cache' error when restoring state to fragmented cache.

* tests : update logic

* cleanup: tightened state_read_meta sig, added is_contiguous case

* fix: state_read_meta arg reorder loose ends

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-12-15 19:28:35 +02:00
Xuan-Son Nguyen 4d5ae24c0a
arg: fix common_params_parse not accepting negated arg (#17991) 2025-12-13 12:53:37 +01:00
Jeff Bolz 303f8615e9
vulkan: Multi-pass softmax for large number of cols (#17892)
When the number of cols is large, split each row across multiple workgroups.
There are three phases that communicate partial results through temp buffers:
(1) compute max partials
(2) take max of partials, compute sum(exp(x-max)) partials
(3) sum partials, compute scaled result
2025-12-13 10:04:29 +01:00
Jeff Bolz 07a10c1090
vulkan: Allow non-pow2 n_experts in topk_moe (#17872) 2025-12-13 08:40:04 +01:00
Xuan-Son Nguyen 380b4c984e
common: support negated args (#17919)
* args: support negated args

* update docs

* fix typo

* add more neg options

* Apply suggestions from code review

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* rm duplicated arg

* fix LLAMA_ARG_NO_HOST

* add test

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-12-12 23:58:53 +01:00
Piotr Wilkin (ilintar) 53ecd4fdb9
SOLVE_TRI extension to more dimensions (#17793)
* Extended TRI

* Fix whitespace

* chore: update webui build output

* Just use cuBLAS for everything...

* Merge both versions

* Remove incorrect imports causing failures for CI

* Still failing... remove all direct cublas imports and rely on common imports from "common.cuh"

* Defines for hipBlas

* Aaaand MUSA defines...

* I hate this job...

* Stupid typo...

* Update ggml/src/ggml-cuda/solve_tri.cu

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2025-12-11 17:20:43 +01:00
Max Krasnyansky e1f4921980
Fix race conditions in threadpool when dealing with dynamic/frequent n_threads changes (#17748)
* tests: update barrier test to check for race condition in active threads

* cpu: combine n_graph and n_threads into a single atomic update

* tests: add multi-graph test for test_barrier
2025-12-10 12:32:23 -08:00
Georgi Gerganov 4dff236a52
ggml : remove GGML_KQ_MASK_PAD constant (#17910)
* ggml : remove GGML_KQ_MASK_PAD constant

* cont : remove comment
2025-12-10 20:53:16 +02:00
Xuan-Son Nguyen 6c2131773c
cli: new CLI experience (#17824)
* wip

* wip

* fix logging, add display info

* handle commands

* add args

* wip

* move old cli to llama-completion

* rm deprecation notice

* move server to a shared library

* move ci to llama-completion

* add loading animation

* add --show-timings arg

* add /read command, improve LOG_ERR

* add args for speculative decoding, enable show timings by default

* add arg --image and --audio

* fix windows build

* support reasoning_content

* fix llama2c workflow

* color default is auto

* fix merge conflicts

* properly fix color problem

Co-authored-by: bandoti <bandoti@users.noreply.github.com>

* better loading spinner

* make sure to clean color on force-exit

* also clear input files on "/clear"

* simplify common_log_flush

* add warning in mtmd-cli

* implement console writter

* fix data race

* add attribute

* fix llama-completion and mtmd-cli

* add some notes about console::log

* fix compilation

---------

Co-authored-by: bandoti <bandoti@users.noreply.github.com>
2025-12-10 15:28:59 +01:00
Aldehir Rojas 2fbe3b7bb7
common : add parser for ministral/mistral large 3/devstral 2 (#17713) 2025-12-09 17:31:04 -06:00