Commit Graph

1667 Commits

Author SHA1 Message Date
Georgi Gerganov acf1b5d842
Merge 1c128d941e into 6b949d1078 2026-04-01 14:05:51 +03:00
Adrien Gallouët 41361c8599
common : move up common_init() and fix Windows UTF-8 logs (#21176)
The build info is now only for debug, so we avoid the duplicate
with `--version`.

The UTF-8 setup at the beginning is needed to avoid logging
garbage on Windows.

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-31 12:53:41 +02:00
Sigbjørn Skjæret e2eb39e81c
ci : bump ty to 0.0.26 (#21156)
* fix incorrect type ignore comments

* bump ty to 0.0.26
2026-03-30 09:29:15 +02:00
Georgi Gerganov 1c128d941e
remove junk 2026-03-29 17:31:04 +03:00
Neo Zhang afe65aa282
[SYCL] Enhance build script to use half cores to build, avoid OS hang (#21093)
* use half cores to build, avoid OS hang

* reduce the output text num to short test time

* avoid to return 0
2026-03-29 09:02:45 +08:00
yikechayedan 406f4e3f61
android : fix-pointer-dangling (#20974) 2026-03-25 11:51:26 +02:00
Sigbjørn Skjæret 29b28a9824
ci : switch from pyright to ty (#20826)
* type fixes

* switch to ty

* tweak rules

* tweak more rules

* more tweaks

* final tweak

* use common import-not-found rule
2026-03-21 08:54:34 +01:00
Ray Xu 8d880ac012
examples : fix empty items in json_schema_to_grammar.py [no ci] (#19968)
* Fix logic for retrieving schema items in `json_schema_to_grammar.py`

If `schema['items']` is `{}` and `prefixItems not in schema', as `{}` is Falsy, the original code here will raise an error.

I think if `schema['items']` is `{}`, them items should just be `{}`

* Apply suggestion from @CISC

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Add tests for arrays with empty items

Add two unit tests to `tests/test-json-schema-to-grammar.cpp` that validate handling of arrays when 'items' is an empty schema and when 'prefixItems' is present alongside an empty 'items'. Both tests expect the same generated grammar, ensuring the JSON Schema->grammar conversion treats an empty 'items' schema (and the presence of 'prefixItems') correctly and covering this edge case.

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-03-10 14:38:18 +01:00
Piotr Wilkin (ilintar) 566059a26b
Autoparser - complete refactoring of parser architecture (#18675)
* Autoparser - full single commit squish

* Final pre-merge changes: minor fixes, Kimi 2.5 model parser
2026-03-06 21:01:00 +01:00
Marcel Petrick 92f7da00b4
chore : correct typos [no ci] (#20041)
* fix(docs): correct typos found during code review

Non-functional changes only:
- Fixed minor spelling mistakes in comments
- Corrected typos in user-facing strings
- No variables, logic, or functional code was modified.

Signed-off-by: Marcel Petrick <mail@marcelpetrick.it>

* Update docs/backend/CANN.md

Co-authored-by: Aaron Teo <taronaeo@gmail.com>

* Revert "Auxiliary commit to revert individual files from 846d1c301281178efbc6ce6060ad34c1ebe45af8"

This reverts commit 02fcf0c7db661d5ff3eff96b2b2db9fdb7213256.

* Update tests/test-backend-ops.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Update tests/test-backend-ops.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Signed-off-by: Marcel Petrick <mail@marcelpetrick.it>
Co-authored-by: Aaron Teo <taronaeo@gmail.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-03-05 08:50:21 +01:00
SamareshSingh cb8f4fa3f8
Fix locale-dependent float printing in GGUF metadata (#17331)
* Set C locale for consistent float formatting across all binaries.

* Add C locale setting to all tools binaries

Add std::setlocale(LC_NUMERIC, "C") to all 16 binaries in the tools/
directory to ensure consistent floating-point formatting.

* Apply suggestion from @JohannesGaessler

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2026-03-04 09:30:40 +01:00
Georgi Gerganov a3405d4260
track total time 2026-02-23 21:22:02 +02:00
Daniel Bevenius 72b44c0d21
model-conversion : merge inspect-org-model.py with tensor-info.py (#19823)
This commit replaces/merges the inspect-org-model.py script with the
contents tensor-info.py script. The merged script has also been updated
to also print tensor sizes which was the only thing that was not done
before (by tensor-info.py that is).

The motivation for this is that tensor-info.py does not load the tensor
weights which can be time consuming for larger models. And also now that
both are doing almost the same thing it makes sense to just have one and
not two scripts to maintain.
2026-02-23 14:15:16 +01:00
Daniel Bevenius 2b6dfe824d
llama : remove write/read of output ids/logits/embeddings (#18862)
* llama : remove write/read of output ids/logits/embeddings

This commit removes the write/read of output ids, logits and
embeddings from the llama context state.

Refs: https://github.com/ggml-org/llama.cpp/pull/18862#issuecomment-3756330941

* completion : add replying of session state

This commit updates the session handing in the completion tool to handle
the that logits are no longer stored in the session file. Instead, we
need to replay the last token to get the logits for sampling.

* common : add common_prompt_batch_decode function

This commit adds a new function which is responsible for decoding prompt
and optionally handle the saving for session data.

* update save-state.cpp to use llama_state_load_file

This commit updates the save-load-state example to utilize the new
llama_state_load_file function for loading the model state from a file.
And it also replays the last token after loading since this state is now
stored before the last token is processed.

* examples : set n_seq_max = 2 for ctx3

This commit updates the save-load-state example to set the n_seq_max
parameter to 2 when initializing the ctx3 context.

The motivation for this change is that using 1 as n_parallel/n_seq_max
the context only supports one sequence, but the test laster tries to
use a second sequence which results in the following error:
```console
main : loaded state with 4 tokens
main : seq 0 copied, 225760 bytes
main : kv cache cleared
find_slot: seq_id=1 >= n_seq_max=1 Try using a bigger --parallel value
state_read_meta: failed to find available cells in kv cache
```
This seems to only happen for recurrent/hybrid models.
2026-02-23 07:04:30 +01:00
Daniel Bevenius 2b089c7758
model-conversion : add option to print tensor values (#19692)
This commit updates the tensor-info.py script to support the option to
print the first N values of a tensor when displaying its information.

The motivation for this is that it can be useful to inspect some actual
values in addition to the shapes of the tensors.
2026-02-17 20:43:22 +01:00
Daniel Bevenius 667b694278
model-conversion : make printing of config values optional (#19681)
* model-conversion : make printing of config values optional

This commit updates run-org-model.py to make the printing of model
configuration values optional.

The motivation for this change is that not all models have these
configuration values defined and those that do not will error when
running this script. With these changes we only print the values if they
exist or a default value.

We could optionally just remove them but it can be useful to see these
values when running the original model.
2026-02-17 10:46:53 +01:00
Georgi Gerganov c0c3e428dd
refactor 2026-02-16 23:02:45 +02:00
Georgi Gerganov 7f049860b4
resoning and error handling 2026-02-16 22:16:15 +02:00
Georgi Gerganov 2ffa45edfc
add tokens 2026-02-16 21:52:54 +02:00
Georgi Gerganov 9c29be1177
store full response 2026-02-16 21:44:29 +02:00
Georgi Gerganov 013963cfd5
add html 2026-02-16 21:22:06 +02:00
Georgi Gerganov e2e998a2d6
fix prompts 2026-02-16 21:02:25 +02:00
Georgi Gerganov 6c41664b8b
simplify 2026-02-16 19:50:27 +02:00
Georgi Gerganov 7b84af8051
fix counts 2026-02-16 16:38:31 +02:00
Georgi Gerganov 60a501e138
cleanup 2026-02-16 16:31:14 +02:00
Georgi Gerganov e6e777cfb3
resume eval 2026-02-16 16:21:36 +02:00
Georgi Gerganov ad3a54eb68
ignore errors 2026-02-16 15:23:23 +02:00
Georgi Gerganov c6d70b9bea
add AGENTS.md 2026-02-16 13:13:35 +02:00
Georgi Gerganov de956a6ca8
cleanup 2026-02-16 12:02:16 +02:00
Georgi Gerganov 350e7c1409
datasets : fix aime2025 2026-02-16 11:55:57 +02:00
Georgi Gerganov db10dda1f3
grade : improve regex + logs 2026-02-16 11:51:36 +02:00
Georgi Gerganov 52759bf078
grader : update prompt 2026-02-16 11:17:53 +02:00
Georgi Gerganov 99e3c3d02c
datasets : add aime2025 2026-02-16 11:07:54 +02:00
Georgi Gerganov c6315655b7
cont 2026-02-16 10:56:58 +02:00
Georgi Gerganov f762a71d56
grader : improve example answers 2026-02-16 10:51:41 +02:00
Georgi Gerganov 73e61d5b75
rename 2026-02-16 10:30:10 +02:00
Georgi Gerganov cffd268bb3
add gpqa + sampling + docs 2026-02-16 00:52:33 +02:00
Georgi Gerganov e8a807519a
datasets : add gsm8k 2026-02-15 23:19:46 +02:00
Georgi Gerganov 1db8428f00
remove old files 2026-02-15 22:16:54 +02:00
Georgi Gerganov 7751ae2796
docs 2026-02-15 22:15:50 +02:00
Georgi Gerganov d2b10302ce
improve grader 2026-02-15 22:12:02 +02:00
Georgi Gerganov 68dde884d6
minor 2026-02-15 21:21:40 +02:00
Georgi Gerganov fd90796da2
eval : support multiple dataset runs 2026-02-15 21:08:24 +02:00
Georgi Gerganov 8156d549f6
sim : fix answer matching 2026-02-15 21:08:24 +02:00
Georgi Gerganov 9695e6feb4
test : fix path 2026-02-15 21:08:24 +02:00
Georgi Gerganov fb1481d60d
eval : add prompts 2026-02-15 21:08:24 +02:00
Georgi Gerganov 812ae13ec1
eval : print progress 2026-02-15 21:08:24 +02:00
Georgi Gerganov e79e8d02d5
examples: add task summary table to llama-eval-new.py 2026-02-15 21:08:23 +02:00
Georgi Gerganov a939f4c47e
docs: update llama-eval-discussion.md with threading and model parameter updates
- Add threading support implementation details
- Document ThreadPoolExecutor usage and thread safety
- Add model parameter implementation details
- Include testing results for both features
2026-02-15 21:08:23 +02:00
Georgi Gerganov 62b04cef54
examples: add threading support and model parameter to llama-eval-new.py
- Add ThreadPoolExecutor for parallel request processing controlled by --threads
- Add --model argument to specify model name in request data
- Refactor process() to use thread-safe _process_single_case() method
- Update progress tracking to work with concurrent execution
2026-02-15 21:08:23 +02:00