Commit Graph

1838 Commits

Author SHA1 Message Date
Naco Siren 93dc70c307
Merge 67012e3e64 into 58062860af 2025-12-17 05:51:06 +02:00
Daniel Bevenius 79dbae034a
model-conversion : remove -fa option in model card template [no ci] (#18088)
This commit updates the causal model card template and removes the
-fa option as it is no longer required (fa is auto detected).
2025-12-16 13:25:09 +01:00
Xuan-Son Nguyen 7b1db3d3b7
arg: clarify auto kvu/np being set on server (#17997)
* arg: clarify auto kvu/np being set on server

* improve docs

* use invalid_argument
2025-12-16 12:01:27 +01:00
Daniel Bevenius 9963b81f63
model-conversion : add note about verifying previous models (#18082)
This commit adds a note to the README in the model-conversion
examples, advising developers to verify that previous versions of models
pass logits verification before adding new models from the same family.
2025-12-16 11:17:40 +01:00
Daniel Bevenius db81d5ec4b
model-conversion : use CONVERTED_EMBEDDING_MODEL for embedding_verify_logits (#18079)
This commit updates the embedding model verification script to use the
CONVERTED_EMBEDDING_MODEL environment variable instead of using the
EMBEDDING_MODEL_PATH (the original embedding model path) as the basis
for the converted model file name.

The motivation for this that currently if the converted embedding model
file name differs from the original embedding model directory/name the
verification script will look for the wrong .bin files that were
generating when running the models.
2025-12-16 11:17:20 +01:00
Georgi Gerganov 254098a279
common : refactor common_sampler + grammar logic changes (#17937)
* common : refactor common_sampler + grammar logic changes

* tests : increase max_tokens to get needed response

* batched : fix uninitialized samplers
2025-12-14 10:11:13 +02:00
Georgi Gerganov 77ad8542bd
model-conversion : cast logits to float32 (#18009) 2025-12-14 08:58:13 +02:00
Georgi Gerganov 3c6391e748
speculative-simple : free batch on exit (#17985) 2025-12-13 09:48:34 +02:00
Daniel Bevenius fd1085ffb7
model-conversion : use CONVERTED_MODEL value for converted model [no ci] (#17984)
* model-conversion : use CONVERTED_MODEL value for converted model [no ci]

This commit updates the model verification scripts to use the
CONVERTED_MODEL environment variable instead of using the MODEL_PATH
(the original model path) as the basis for the converted model file
name.

The motivation for this that currently if the converted model file name
differs from the original model directory/name the verification scripts
will look for the wrong .bin files that were generating when running the
models.
For example, the following steps were not possible:
```console
(venv) $ huggingface-cli download google/gemma-3-270m-it --local-dir ggml-org/gemma-3-270m
(venv) $ python3 convert_hf_to_gguf.py ggml-org/gemma-3-270m --outfile test-bf16.gguf --outtype bf16
(venv) $ cd examples/model-conversion/
(venv) $ export MODEL_PATH=../../ggml-org/gemma-3-270m
(venv) $ export CONVERTED_MODEL=../../test-bf16.gguf
(venv) $ make causal-verify-logits
...
Data saved to data/llamacpp-test-bf16.bin
Data saved to data/llamacpp-test-bf16.txt
Error: llama.cpp logits file not found: data/llamacpp-gemma-3-270m.bin
Please run scripts/run-converted-model.sh first to generate this file.
make: *** [Makefile:62: causal-verify-logits] Error 1
```

With the changes in this commit, the above steps will now work as
expected.
2025-12-13 08:34:26 +01:00
Xuan-Son Nguyen 380b4c984e
common: support negated args (#17919)
* args: support negated args

* update docs

* fix typo

* add more neg options

* Apply suggestions from code review

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* rm duplicated arg

* fix LLAMA_ARG_NO_HOST

* add test

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-12-12 23:58:53 +01:00
Daniel Bevenius dada4c846d
model-conversion : remove max diff check in compare-logits [no ci] (#17954)
This commit removes the maximum difference check from the
compare-logits.py which would stop early if the difference between
the logits exceeded a threshold.

The motivation for removing this is that it can be useful to be able to
get the complete log for debugging/reporting purposes.
2025-12-12 13:25:16 +01:00
Xuan-Son Nguyen 6c2131773c
cli: new CLI experience (#17824)
* wip

* wip

* fix logging, add display info

* handle commands

* add args

* wip

* move old cli to llama-completion

* rm deprecation notice

* move server to a shared library

* move ci to llama-completion

* add loading animation

* add --show-timings arg

* add /read command, improve LOG_ERR

* add args for speculative decoding, enable show timings by default

* add arg --image and --audio

* fix windows build

* support reasoning_content

* fix llama2c workflow

* color default is auto

* fix merge conflicts

* properly fix color problem

Co-authored-by: bandoti <bandoti@users.noreply.github.com>

* better loading spinner

* make sure to clean color on force-exit

* also clear input files on "/clear"

* simplify common_log_flush

* add warning in mtmd-cli

* implement console writter

* fix data race

* add attribute

* fix llama-completion and mtmd-cli

* add some notes about console::log

* fix compilation

---------

Co-authored-by: bandoti <bandoti@users.noreply.github.com>
2025-12-10 15:28:59 +01:00
Daniel Bevenius 2fa51c19b0
model-conversion : add token ids to prompt token output [no ci] (#17863)
This commit adds the token ids to the printed prompt outputs.

The motivation for this is that is can be useful to see the actual token
ids alongside the token strings for debugging.
2025-12-08 17:13:08 +01:00
Georgi Gerganov 8ce774a102
metal : fix build(#17799)
* metal : fix build

* tests : fix context destruction
2025-12-06 09:33:59 +02:00
Georgi Gerganov c41bde6fbd
metal : add residency sets keep-alive heartbeat (#17766)
* examples : add idle

* metal : attach residency sets to queue

* idle : add link

* idle : adjust intervals

* metal : add residency sets keep-alive heartbeat

* cont : adjust default keep-alive time
2025-12-05 19:38:54 +02:00
Daniel Bevenius 817d743cc1
examples : add missing code block end marker [no ci] (#17756)
This commit adds the missing code block end marker in simple-cmake-pkg
to correct the formatting.
2025-12-04 14:17:30 +01:00
Igor Smirnov 0874693b44
common : fix json schema with '\' in literals (#17307)
* Fix json schema with '\' in literals

* Add "literal string with escapes" test
2025-11-29 17:06:32 +01:00
Neo Zhang 7d2add51d8
sycl : support to malloc memory on device more than 4GB, update the doc and script (#17566)
Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>
2025-11-29 14:59:44 +02:00
Diego Devesa e072b2052e
ggml : add GGML_SCHED_NO_REALLOC option to disable reallocations in ggml_backend_sched (#17276)
* ggml : add GGML_SCHED_NO_REALLOC option to disable reallocations in ggml_backend_sched
Enabled in ggml-ci for testing.

* llama : update worst-case graph for unified cache

* ci : disable op offload in some tests

* fix spelling

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-11-28 17:33:23 +02:00
Piotr Wilkin (ilintar) ff55414c42
model : Qwen3 Next (#16095)
* Qwen3 Next - cleaned up version

* Whitespaces and stuff

* Correct minor errors

* Update src/llama-model.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Misc. fixes.

* Clean up code, add missing hybrid qualifier

* Did someone transpose the SOLVE_TRI result matrix? Perhaps...

* Whitespace

* Proper tensors for cb calls

* Use llama-graph.h vertical alignment

* BROKEN: chunking

* Set new tensors as inputs.

* Proper chunk logic

* It's the circle of life...

* More shenanigans for n_seq > 1

* Nail in the coffin?

* Fix Windows build

* Eh, one fails on Windows, the other fails on Mac... just use general capture.

* quant : cleanup

* model : cleanup

* qwen3 : cleanup

* cont : cleanup

* cont : cleanup

* ggml : revert change

* qwen3 : cleanup

* cont : cleanup

* Readd cmath

* qwen3 : fix typo

* Update convert_hf_to_gguf.py

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Usual suspects

* fix my bad suggestion

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-11-28 12:02:56 +01:00
Daniel Bevenius 6ab8eacddf
examples : add -kvu to batched usage example [no ci] (#17469)
This commit adds the --kv-unified flag to the usage example
in the README.md file for the batched example.

The motivation for this is that without this flag the example will fail
with the following error:
```console
Hello my name is
split_equal: sequential split is not supported when there are coupled
sequences in the input batch (you may need to use the -kvu flag)
decode: failed to find a memory slot for batch of size 4
main: llama_decode() failed
```
2025-11-24 15:38:45 +02:00
william pan 4902eebe33
models : Added support for RND1 Diffusion Language Model (#17433)
* Converted RND1 model to GGUF weights

* RND1 llama.cpp support v1

* RND1 llama.cpp support v2 non causal bug

* RND1 llama.cpp support v3 doccumentation

* RND1 llama.cpp support v4 clean code

* linting issues

* RND1 pr fixes v1

* RND1 pr fixes v2

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Diffusion documentation edits

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-11-24 14:16:56 +08:00
naco-siren b0d47f2ea7 Fix linters issues in editorconfig-checker job
https://github.com/ggml-org/llama.cpp/actions/runs/19548770247/job/55974800633?pr=17413
2025-11-20 15:19:41 -08:00
naco-siren d7da9ea9a8 Merge branch 'ggml-master' into ai-chat-binding-2 2025-11-20 11:14:18 -08:00
naco-siren 254cd841b2 Remove cpu_features 2025-11-20 10:21:00 -08:00
Georgi Gerganov 196f5083ef
common : more accurate sampling timing (#17382)
* common : more accurate sampling timing

* eval-callback : minor fixes

* cont : add time_meas impl

* cont : fix log msg [no ci]

* cont : fix multiple definitions of time_meas

* llama-cli : exclude chat template init from time measurement

* cont : print percentage of unaccounted time

* cont : do not reset timings
2025-11-20 13:40:10 +02:00
Gabe Goodhart 5886f4f545
examples(gguf): GGUF example outputs (#17025)
* feat(llama-gguf): Print out the tensor type in llama-gguf r

Branch: Mamba2Perf

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* feat(off-topic): print the number of elements in tensors with llama-gguf

Branch: Mamba2SSD

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* style: valign

Branch: GGUFToolOutputs

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* Update examples/gguf/gguf.cpp

---------

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-11-05 19:58:16 +02:00
Daniel Bevenius ed8aa63320
model-conversion : pass config to from_pretrained (#16963)
This commit modifies the script `run-org-model.py` to ensure that the
model configuration is explicitly passed to the `from_pretrained` method
when loading the model. It also removes a duplicate configuration
loading which was a mistake.

The motivation for this change is that enables the config object to be
modified and then passed to the model loading function, which can be
useful when testing new models.
2025-11-03 18:01:59 +01:00
Han Yin e76554304d lib: enable app optimization 2025-10-31 18:33:18 -07:00
Han Yin 33987b56fa jni: introduce a logging util to filter different logging levels on different build types 2025-10-31 18:28:10 -07:00
Han Yin 3fa3c15c5c lib: revert System.load back to System.loadLibrary 2025-10-31 13:54:41 -07:00
Han Yin f94efbacbb cleanup: remove Arm AI Chat/Playground app source code; replace with the basic sample app from https://github.com/hanyin-arm/Arm-AI-Chat-Sample
Note: the full Google Play version of AI Chat app will be open will be open sourced in another repo soon, therefore didn't go through the trouble of pruning the history using `git filter-repo` here.
2025-10-30 13:37:59 -07:00
Han Yin cadaf8044b lib: remove kleidi-llama related namings 2025-10-28 11:39:19 -07:00
Han Yin 266fc314ef lib: change `LlamaTier` to `ArmCpuTier` 2025-10-28 11:39:19 -07:00
Han Yin 3644082a82 lib: perform engine state check inclusively instead of exclusively 2025-10-28 11:39:19 -07:00
Han Yin f10d1ab022 lib: add File version for GGUF Magic number verification 2025-10-28 11:39:19 -07:00
Han Yin f833c3a7ac app: extract AppContent from MainActivity to a separate file in ui package 2025-10-28 11:39:19 -07:00
Han Yin 42e3972b30 app: remove deprecated SystemUIController from accompanist by migrating to EdgeToEdge 2025-10-28 11:39:19 -07:00
Han Yin 7c2e6d0a2f app: bump ksp version 2025-10-28 11:39:19 -07:00
Han Yin 8897b78055 llama: update the app's package name and namespace 2025-10-28 11:39:19 -07:00
Han Yin 56e83b723b llama: update the library's package name and namespace 2025-10-28 11:39:19 -07:00
Han Yin 96817ae667 llama: update the library name in JNI and CMake project 2025-10-28 11:39:19 -07:00
Han Yin 6dfdc2c105 lib: replace the factory pattern for deprecated tiered lib loading with single instance pattern 2025-10-28 11:39:19 -07:00
Han Yin 63e5bd0771 lib: support x86-64 by dynamically set Arm related definitions 2025-10-28 11:39:19 -07:00
Han Yin 8f90e42ee2 UI: fix the layout issue on large font sizes 2025-10-28 11:39:19 -07:00
Han Yin 930e707608 UI: better usage of tertiary colors to highlight model cards but not for warnings 2025-10-28 11:39:19 -07:00
Han Yin ad85bca98b UI: make more room for assistant message bubble's width 2025-10-28 11:39:19 -07:00
Han Yin 83abff8a64 UI: minor color palette changes; emphasize the bottom bar FABs; fix Settings Screen menu item label 2025-10-28 11:39:18 -07:00
Han Yin 2223c54cc6 core: further improve the performance on native methods 2025-10-28 11:39:18 -07:00
Han Yin d5220549b6 UI: fix the font size auto scaling for ArmFeaturesVisualizer 2025-10-28 11:39:18 -07:00