Commit Graph

460 Commits

Author SHA1 Message Date
Saba Fallah dc2066e535 check with fixed expected resutls 2025-12-14 16:32:36 +01:00
Saba Fallah 47f0fee6c9 testing deepseek-ocr
quick and dirty test script comparing results of Qwen2.5-VL vs DeepSeek-OCR
2025-12-11 17:00:11 +01:00
Saba Fallah 4cbbe8ab6f minor: editconfig-check fix 2025-12-11 10:32:38 +01:00
Saba Fallah d70f171fac merge with changes from https://github.com/ggml-org/llama.cpp/pull/17909
added new opt to tests.sh to disable flash-attn
2025-12-11 10:11:27 +01:00
Saba Fallah 33fabf0bd8 Merge branch 'master' into sf/deepseek-ocr-merge-test
# Conflicts:
#	tools/mtmd/clip.cpp
#	tools/mtmd/mtmd-cli.cpp
2025-12-11 08:13:50 +01:00
Saba Fallah aaf2fd17bb minor: editconfig-check fix 2025-12-11 07:31:08 +01:00
Xuan-Son Nguyen c6b2c9310c
mtmd: some small clean up (#17909)
* clip: add support for fused qkv in build_vit

* use bulid_ffn whenever possible

* fix internvl

* mtmd-cli: move image to beginning

* test script: support custom args
2025-12-10 22:20:06 +01:00
Xuan-Son Nguyen 34a6d86982
cli: enable jinja by default (#17911)
* cli: enable jinja by default

* Update common/arg.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-12-10 22:19:42 +01:00
Pascal f32ca51bfe
server: add presets (config) when using multiple models (#17859)
* llama-server: recursive GGUF loading

Replace flat directory scan with recursive traversal using
std::filesystem::recursive_directory_iterator. Support for
nested vendor/model layouts (e.g. vendor/model/*.gguf).
Model name now reflects the relative path within --models-dir
instead of just the filename. Aggregate files by parent
directory via std::map before constructing local_model

* server : router config POC (INI-based per-model settings)

* server: address review feedback from @aldehir and @ngxson

PEG parser usage improvements:
- Simplify parser instantiation (remove arena indirection)
- Optimize grammar usage (ws instead of zero_or_more, remove optional wrapping)
- Fix last line without newline bug (+ operator instead of <<)
- Remove redundant end position check

Feature scope:
- Remove auto-reload feature (will be separate PR per @ngxson)
- Keep config.ini auto-creation and template generation
- Preserve per-model customization logic

Co-authored-by: aldehir <aldehir@users.noreply.github.com>
Co-authored-by: ngxson <ngxson@users.noreply.github.com>

* server: adopt aldehir's line-oriented PEG parser

Complete rewrite of INI parser grammar and visitor:
- Use p.chars(), p.negate(), p.any() instead of p.until()
- Support end-of-line comments (key=value # comment)
- Handle EOF without trailing newline correctly
- Strict identifier validation ([a-zA-Z_][a-zA-Z0-9_.-]*)
- Simplified visitor (no pending state, no trim needed)
- Grammar handles whitespace natively via eol rule

Business validation preserved:
- Reject section names starting with LLAMA_ARG_*
- Accept only keys starting with LLAMA_ARG_*
- Require explicit section before key-value pairs

Co-authored-by: aldehir <aldehir@users.noreply.github.com>

* server: fix CLI/env duplication in child processes

Children now receive minimal CLI args (executable, model, port, alias)
instead of inheriting all router args. Global settings pass through
LLAMA_ARG_* environment variables only, eliminating duplicate config
warnings.

Fixes: Router args like -ngl, -fa were passed both via CLI and env,
causing 'will be overwritten' warnings on every child spawn

* add common/preset.cpp

* fix compile

* cont

* allow custom-path models

* add falsey check

* server: fix router model discovery and child process spawning

- Sanitize model names: replace / and \ with _ for display
- Recursive directory scan with relative path storage
- Convert relative paths to absolute when spawning children
- Filter router control args from child processes
- Refresh args after port assignment for correct port value
- Fallback preset lookup for compatibility
- Fix missing argv[0]: store server binary path before base_args parsing

* Revert "server: fix router model discovery and child process spawning"

This reverts commit e3832b42eeea7fcb108995966c7584479f745857.

* clarify about "no-" prefix

* correct render_args() to include binary path

* also remove arg LLAMA_ARG_MODELS_PRESET for child

* add co-author for ini parser code

Co-authored-by: aldehir <hello@alde.dev>

* also set LLAMA_ARG_HOST

* add CHILD_ADDR

* Remove dead code

---------

Co-authored-by: aldehir <aldehir@users.noreply.github.com>
Co-authored-by: ngxson <ngxson@users.noreply.github.com>
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
Co-authored-by: aldehir <hello@alde.dev>
2025-12-10 22:18:21 +01:00
Saba Fallah ed944cd25b fix: test-1.jpg ORC issue with small (640) resolution
setting min-resolution base (1024) max large (1280) for dynamic-resolution
2025-12-10 20:20:55 +01:00
Georgi Gerganov 4dff236a52
ggml : remove GGML_KQ_MASK_PAD constant (#17910)
* ggml : remove GGML_KQ_MASK_PAD constant

* cont : remove comment
2025-12-10 20:53:16 +02:00
Xuan-Son Nguyen 6c2131773c
cli: new CLI experience (#17824)
* wip

* wip

* fix logging, add display info

* handle commands

* add args

* wip

* move old cli to llama-completion

* rm deprecation notice

* move server to a shared library

* move ci to llama-completion

* add loading animation

* add --show-timings arg

* add /read command, improve LOG_ERR

* add args for speculative decoding, enable show timings by default

* add arg --image and --audio

* fix windows build

* support reasoning_content

* fix llama2c workflow

* color default is auto

* fix merge conflicts

* properly fix color problem

Co-authored-by: bandoti <bandoti@users.noreply.github.com>

* better loading spinner

* make sure to clean color on force-exit

* also clear input files on "/clear"

* simplify common_log_flush

* add warning in mtmd-cli

* implement console writter

* fix data race

* add attribute

* fix llama-completion and mtmd-cli

* add some notes about console::log

* fix compilation

---------

Co-authored-by: bandoti <bandoti@users.noreply.github.com>
2025-12-10 15:28:59 +01:00
Aldehir Rojas 2fbe3b7bb7
common : add parser for ministral/mistral large 3/devstral 2 (#17713) 2025-12-09 17:31:04 -06:00
bluebread 016140699f mtmd: remove tweak to llama-mtmd-cli & deepseek-ocr template 2025-12-09 16:31:44 +00:00
Rhys-T 63908b631a
cmake: fix Mach-O current version number (#17877)
PR #17091 set the VERSION of various libraries to 0.0.abcd, where abcd
is the LLAMA_BUILD_NUMBER. That build number is too large to fit in the
Mach-O 'current version' field's 'micro' part, which only goes up to
255. This just sets the Mach-O current version to 0 to get it building
properly again.

Fixes #17258.
2025-12-09 13:17:41 +02:00
Xuan-Son Nguyen 951520ddb0
server: delegate result_state creation to server_task (#17835)
* server: delegate result_state creation to server_task

* remove unued states

* add more docs
2025-12-08 17:04:38 +01:00
Xuan-Son Nguyen f896d2c34f
server: improve speed of speculative decoding (#17808)
* server: improve speed of speculative decoding

* fix small draft case

* add link to the PR

* server : fix generation time measurement

* server : fix draft acceptance logs (add SRV_CNT, SLT_CNT macros)

* server : add comment

* add PR to docs

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-12-08 14:35:28 +01:00
Xuan-Son Nguyen 37a4f63244
server : add development documentation (#17760)
* first draft

* rewrite

* update & remove duplicated sections
2025-12-08 13:54:58 +01:00
Georgi Gerganov 2bc96931d2
server : make cache_reuse configurable per request (#17858) 2025-12-08 12:43:12 +02:00
bluebread 5174a1e69a mtmd: minor fix 2025-12-08 04:54:19 +00:00
bluebread 48c6cf2132 mtmd: convert model in FP16 2025-12-08 02:36:00 +00:00
bluebread 53273f83f8 mtmd: fixed wrong input setting 2025-12-07 23:58:22 +00:00
bluebread 5dfcc5abb1 mtmd: add detailed comments for resize_bicubic_pillow 2025-12-07 10:15:09 +00:00
Vishal Singh 017761daf5
ggml-zendnn : add ZenDNN backend for AMD CPUs (#17690)
* ggml-zennn: add ZenDNN backend support

* ggml-zendnn : address ZenDNN backend review fixes and suggestions

* docs : apply blockquote syntax to ZenDNN docs

---------

Co-authored-by: Manoj Kumar <mkumar@zettabolt.com>
2025-12-07 00:13:33 +08:00
Xuan-Son Nguyen c42712b056
server: support multiple generations from one prompt (OAI "n" option) (#17775)
* backend support

* server: support multiple generations from one prompt (OAI "n" option)

* fix invalid batch

* format oai

* clean up

* disable ctx shift

* add test

* update comments

* fix style

* add n_cmpl to docs [no ci]

* allowing using both n_cmpl and n
2025-12-06 15:54:38 +01:00
Aleksander Grygier a28e3c7567
webui: Stop generation from chat sidebar (#17806)
* feat: Add stop generation button for Conversation Item

* chore: update webui build output
2025-12-06 13:29:15 +01:00
Aleksander Grygier e31b5c55c3
webui: Fix context available value in Multi-model Router mode (#17804)
* fix: Use context size from `/props?model=...` in ROUTER mode

* chore: update webui build output
2025-12-06 13:23:29 +01:00
Aleksander Grygier 21f24f27a9
webui: Per-conversation system message with UI displaying, edition & branching (#17275)
* feat: Per-conversation system message with optional display in UI, edition and branching (WIP)

* chore: update webui build output
2025-12-06 13:19:05 +01:00
bluebread 2d918b3e21 mtmd: make sam hparams configurable 2025-12-06 06:55:53 +00:00
bluebread 15f2ada0ed mtmd: simplify get_rel_pos 2025-12-06 06:32:41 +00:00
Saba Fallah 705394c27a minor editorconfig-check fixes 2025-12-05 13:27:52 +01:00
Saba Fallah d981f19e9d minor editorconfig-check fixes 2025-12-05 13:18:15 +01:00
Saba Fallah 5f2ee1aecf
Merge branch 'ggml-org:master' into sf/deepseek-ocr 2025-12-05 11:56:06 +01:00
Saba Fallah f5bd310a5e minor formatting and style 2025-12-05 09:30:58 +01:00
Saba Fallah 076138a428 corrected code-branch when flash-attn disabled
enabling usage of --flash-attn option
2025-12-04 23:45:59 +01:00
Saba Fallah 5381b9cf63 using common build_attn in sam 2025-12-04 23:13:29 +01:00
bluebread fc3f625fef mtmd: support combined QKV projection in buid_vit 2025-12-04 17:57:43 +00:00
Xuan-Son Nguyen 9d0229967a
server: strip content-length header on proxy (#17734) 2025-12-04 16:32:57 +01:00
Saba Fallah a661c52990 reverting automatically removed spaces 2025-12-04 16:12:41 +01:00
Xuan-Son Nguyen c4c10bfb86
server: move msg diffs tracking to HTTP thread (#17740)
* server: move msg diffs tracking to HTTP thread

* wip

* tool call tests ok

* minor : style

* cont : fix

* move states to server_response_reader

* add safe-guard

* fix

* fix 2

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-12-04 15:46:08 +01:00
Saba Fallah c73748ab5d Merge branch 'sf/deepseek-ocr' into sf/deepseek-ocr-cleanup
# Conflicts:
#	gguf-py/gguf/tensor_mapping.py
2025-12-04 15:09:32 +01:00
Saba Fallah 386ba479a2 clean up 2025-12-04 15:05:58 +01:00
bluebread 7451b84105 mtmd: fix tensor names for image newlines and view separator 2025-12-04 13:26:53 +00:00
Adrien Gallouët ef75a89fdb
build : move _WIN32_WINNT definition to headers (#17736)
Previously, cmake was forcing `_WIN32_WINNT=0x0A00` for MinGW builds,
This caused "macro redefined" warnings with toolchains that define the version.

This also removes the `GGML_WIN_VER` variable as it is no longer needed.

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-12-04 07:04:02 +01:00
Piotr Wilkin (ilintar) c6d1a00aa7
Add a couple of file types to the text section (#17670)
* Add a couple of file types to the text section

* Format + regenerate index

* Rebuild after rebase
2025-12-03 21:45:06 +01:00
Aleksander Grygier e9f9483464
Use OpenAI-compatible `/v1/models` endpoint by default (#17689)
* refactor: Data fetching via stores

* chore: update webui build output

* refactor: Use OpenAI compat `/v1/models` endpoint by default to list models

* chore: update webui build output

* chore: update webui build output
2025-12-03 20:49:09 +01:00
Andika Wasisto 41c5e02f42
webui: Fix zero pasteLongTextToFileLen to disable conversion being overridden (#17445)
* webui: Fix zero pasteLongTextToFileLen to disable conversion being overridden

Zero pasteLongTextToFileLen should disable the conversion, but it was
overwritten with 2500.

* Apply suggestions from code review

* Update webui build
2025-12-03 20:45:17 +01:00
bluebread b26b507c4e mtmd: refactor code & remove unused helper functions 2025-12-03 16:23:46 +00:00
bluebread b696c54756 mtmd: remove --dsocr-mode argument 2025-12-03 14:54:16 +00:00
Pascal e7c2cf1356
server: add router multi-model tests (#17704) (#17722)
* llama-server: add router multi-model tests (#17704)

Add 4 test cases for model router:
- test_router_unload_model: explicit model unloading
- test_router_models_max_evicts_lru: LRU eviction with --models-max
- test_router_no_models_autoload: --no-models-autoload flag behavior
- test_router_api_key_required: API key authentication

Tests use async model loading with polling and graceful skip when
insufficient models available for eviction testing.

utils.py changes:
- Add models_max, models_dir, no_models_autoload attributes to ServerProcess
- Handle JSONDecodeError for non-JSON error responses (fallback to text)

* llama-server: update test models to new HF repos

* add offline

* llama-server: fix router LRU eviction test and add preloading

Fix eviction test: load 2 models first, verify state, then load
3rd to trigger eviction. Previous logic loaded all 3 at once,
causing first model to be evicted before verification could occur.

Add module fixture to preload models via ServerPreset.load_all()
and mark test presets as offline to use cached models

* llama-server: fix split model download on Windows

---------

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2025-12-03 15:10:37 +01:00