Commit Graph

393 Commits

Author SHA1 Message Date
Ed Addario a2b86d7fd9
Minor refactoring 2025-11-17 14:14:05 +00:00
Ed Addario 1f3db496cc
Calculate layer_sum only for legacy 2025-11-17 13:36:28 +00:00
Ed Addario 76566b83de
Enforce same-size between compared tensors 2025-11-17 13:28:35 +00:00
Ed Addario fb2b09a43c
Skip experts with zero count (unused) 2025-11-17 13:06:37 +00:00
Ed Addario 63cbcc6dfc
Refactor legacy determination 2025-11-17 13:05:34 +00:00
Ed Addario ae1cbc707b
Warn if problem with previous layer 2025-11-17 13:04:16 +00:00
Ed Addario 5384a11b94
Initialise layer and tensor variables 2025-11-17 13:00:47 +00:00
Ed Addario 559ae9ab89
Refactor legacy imatrix handling 2025-11-17 10:19:34 +00:00
Ed Addario b2b7175e19
Fix bug when vectors are zero 2025-11-06 15:12:09 +00:00
Ed Addario 8bd9d87d3e
Merge branch 'master' into imatrix 2025-10-31 23:19:54 +00:00
Georgi Gerganov c22473b580
server : don't print user inputs to console (#16871) 2025-10-31 10:54:19 +02:00
Daniel Bevenius 0f715b4e75
server : fix typos in server.cpp comments [no ci] (#16883) 2025-10-31 09:51:26 +01:00
Ed Addario ce046dcee8
Save statistics to imatrix 2025-10-30 22:43:46 +00:00
chansikpark 16724b5b68
server : bump request URI max length to 32768 (#16862) 2025-10-30 20:22:23 +02:00
Georgi Gerganov b52edd2558
server : remove n_past (#16818)
* server : remove n_past

* server : replace slot.n_prompt_tokens() with slot.task->n_tokens()

* server : fixes + clean-up

* cont : fix context shift

* server : add server_tokens::pos_next()

Co-authored-by: Xuan-Son Nguyen <son@huggingface.co>

* server : fix pos_next() usage

Co-authored-by: Xuan-Son Nguyen <son@huggingface.co>

---------

Co-authored-by: Xuan-Son Nguyen <son@huggingface.co>
2025-10-30 18:42:57 +02:00
JJJYmmm d261223d24
model: add support for qwen3vl series (#16780)
* support qwen3vl series.

Co-authored-by: Thireus ☠ <Thireus@users.noreply.github.com>
Co-authored-by: yairpatch <yairpatch@users.noreply.github.com>
Co-authored-by: LETS-BEE <LETS-BEE@users.noreply.github.com>

* bugfix: fix the arch check for qwen3vl-moe.

* use build_ffn

* optimize deepstack structure

* optimize deepstack feature saving

* Revert "optimize deepstack feature saving" for temporal fix

This reverts commit f321b9fdf1.

* code clean

* use fused qkv in clip

* clean up / rm is_deepstack_layers for simplification

* add test model

* move test model to "big" section

* fix imrope check

* remove trailing whitespace

* fix rope fail

* metal : add imrope support

* add imrope support for sycl

* vulkan: add imrope w/o check

* fix vulkan

* webgpu: add imrope w/o check

* Update gguf-py/gguf/tensor_mapping.py

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* fix tensor mapping

---------

Co-authored-by: Thireus ☠ <Thireus@users.noreply.github.com>
Co-authored-by: yairpatch <yairpatch@users.noreply.github.com>
Co-authored-by: LETS-BEE <LETS-BEE@users.noreply.github.com>
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-10-30 16:19:14 +01:00
Tianyue-Zhao bacddc049a
model: Add support for CogVLM model (#15002)
* Added GGUF mappings for CogVLM model

* Add tensor mapping for CogVLM visual encoder

* Add CogVLM to conversion script, no vision part yet

* Added CogVLM vision model to conversion script

* Add graph for CogVLM CLIP model

* Add graph for CogVLM

* Fixes for CogVLM. Now compiles.

* Model now runs

* Fixes for cogvlm graph

* Account for graph context change after rebase

* Changes for whitespace

* Changes in convert script according to comments

* Switch CogVLM LLM graph to merged QKV tensor

* Use rope_type variable instead of direct definition

* Change CogVLM CLIP encoder to use SWIGLU

* Switch CogVLM CLIP to use merged QKV

* Apply rebase edits and remove ggml_cont call that is now unnecessary

* clean up

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-10-30 12:18:50 +01:00
Ed Addario 7d8819f57a
Improve compute_layer_statistics() processing of mismatched tensor sizes 2025-10-29 18:36:01 +00:00
Ed Addario 006e7ef991
Improve compute_vector_statistics() processing of mismatched tensor sizes 2025-10-29 18:35:39 +00:00
Ed Addario 2a6f5d7e60
Refactor variable names 2025-10-29 18:32:47 +00:00
Xuan-Son Nguyen e3af5563bd
llama: store mrope data in KV cell (#16825)
* llama: store mrope data in KV cell

* correct x,y ordering

* address review comments

* add consistency checks

* Update src/llama-kv-cache.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* add TODO

* fix asan error

* kv-cells : improve ext handling

* cont : fix headers

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-10-29 18:09:18 +01:00
Ed Addario 6ff0a79e54
Minor stats report cosmetic changes 2025-10-29 07:59:40 +00:00
Ed Addario 86fabce58d
Clamp values 2025-10-28 23:10:44 +00:00
Ed Addario ab015065b8
Minor refactoring 2025-10-28 23:10:12 +00:00
Ed Addario 92a42bac3d
Type refactoring 2025-10-28 23:06:29 +00:00
Ed Addario b5068df804
Minor refactoring 2025-10-28 23:03:52 +00:00
Ed Addario 0b0381c94c
Merge Cosine Similarity and L2 Norm computation into single loop 2025-10-28 21:41:31 +00:00
Ed Addario dc4a04b5c5
Adjust size calculation and change fallback value to 0.0f 2025-10-28 21:35:35 +00:00
Ed Addario 683ef8dfb7
Fill zeros for experts with zero counts to preserve shape 2025-10-28 18:35:17 +00:00
Ed Addario 637e674da6
Avoid division by zero on zero-count matrices 2025-10-28 18:33:37 +00:00
Ed Addario c9a0874f35
Clamp CosSim to [-1, 1] to avoid float drift 2025-10-28 18:29:59 +00:00
Ed Addario af3b6aca22
Fix legacy_mode getting overwritten on each tensor bug 2025-10-28 18:27:19 +00:00
Georgi Gerganov 85a7d8677b
memory : remove KV cache size padding (#16812)
* memory : remove KV cache size padding

* cont : restore padding for n_kv tensor shape

* server : use slot context size instead of training context size

* server : simplify context limit logic
2025-10-28 20:19:44 +02:00
Georgi Gerganov a8ca18b4b8
llama-bench : clarify benchmarked parts of the computation (#16823) 2025-10-28 19:41:43 +02:00
Aldehir Rojas 280d97be96
grammar : support array references in json schema (#16792)
* grammar : support array references in json schema

* Update json-schema-to-grammar.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* grammar : improve regex when naming ref derived rules

* grammar : replace non-conformant definitions array with anyOf test case

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-10-28 09:37:52 +01:00
Xuan-Son Nguyen e1ab084803
mtmd : fix idefics3 preprocessing (#16806)
* mtmd : fix idefics3 preprocessing

* disable granite test

* fix test for granite
2025-10-27 23:12:16 +01:00
Xuan-Son Nguyen c55d53acec
model : add LightOnOCR-1B model (#16764)
* model : add LightOnOCR-1B model

* add test
2025-10-27 16:02:58 +01:00
Ed Addario 8fd2aca8ec
Merge branch 'master' into imatrix 2025-10-25 12:18:30 +01:00
Florian Badie 69e9ff0103
webui: support q URL parameter (#16728)
* webui: support q URL parameter

Fixes #16722
I’ve checked that it works with Firefox’s AI tools

* webui: apply suggestions from code review

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* chore: update webui static build

---------

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
2025-10-24 14:10:29 +02:00
Johannes Gäßler 0bf47a1dbb
server: add memory breakdown print (#16740) 2025-10-23 21:30:17 +02:00
Xuan-Son Nguyen d0660f237a
mtmd-cli : allow using --jinja (#16718)
* mtmd-cli : allow using --jinja

* support -sys

* implement chat_history

* fix clear memory

* rm -sys support, added TODO
2025-10-23 15:00:49 +02:00
Prajwal B Mehendarkar fe6a9882ac
Manually link -lbsd to resolve flock symbol on AIX (#16610) 2025-10-23 19:37:31 +08:00
matteo 8cf6b42d46
server : send partial stop string when <EOG> is reached (#15007) 2025-10-23 12:32:24 +03:00
Pascal 9b9201f65a
webui: introduce OpenAI-compatible model selector in JSON payload (#16562)
* webui: introduce OpenAI-compatible model selector in JSON payload

* webui: restore OpenAI-Compatible model source of truth and unify metadata capture

This change re-establishes a single, reliable source of truth for the active model:
fully aligned with the OpenAI-Compat API behavior

It introduces a unified metadata flow that captures the model field from both
streaming and non-streaming responses, wiring a new onModel callback through ChatService
The model name is now resolved directly from the API payload rather than relying on
server /props or UI assumptions

ChatStore records and persists the resolved model for each assistant message during
streaming, ensuring consistency across the UI and database
Type definitions for API and settings were also extended to include model metadata
and the onModel callback, completing the alignment with OpenAI-Compat semantics

* webui: address review feedback from allozaur

* webui: move model selector into ChatForm (idea by @allozaur)

* webui: make model selector more subtle and integrated into ChatForm

* webui: replaced the Flowbite selector with a native Svelte dropdown

* webui: add developer setting to toggle the chat model selector

* webui: address review feedback from allozaur

Normalized streamed model names during chat updates
by trimming input and removing directory components before saving
or persisting them, so the conversation UI shows only the filename

Forced model names within the chat form selector dropdown to render as
a single-line, truncated entry with a tooltip revealing the full name

* webui: toggle displayed model source for legacy vs OpenAI-Compat modes

When the selector is disabled, it falls back to the active server model name from /props

When the model selector is enabled, the displayed model comes from the message metadata
(the one explicitly selected and sent in the request)

* Update tools/server/webui/src/lib/components/app/chat/ChatForm/ChatFormActions.svelte

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* Update tools/server/webui/src/lib/constants/localstorage-keys.ts

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* Update tools/server/webui/src/lib/components/app/chat/ChatForm/ChatFormModelSelector.svelte

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* Update tools/server/webui/src/lib/services/chat.ts

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* Update tools/server/webui/src/lib/services/chat.ts

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* webui: refactor model selector and persistence helpers

- Replace inline portal and event listeners with proper Svelte bindings
- Introduce 'persisted' store helper for localStorage sync without runes
- Extract 'normalizeModelName' utils + Vitest coverage
- Simplify ChatFormModelSelector structure and cleanup logic

Replaced the persisted store helper's use of '$state/$effect' runes with
a plain TS implementation to prevent orphaned effect runtime errors
outside component context

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* webui: document normalizeModelName usage with inline examples

* Update tools/server/webui/src/lib/components/app/chat/ChatForm/ChatFormModelSelector.svelte

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* Update tools/server/webui/src/lib/stores/models.svelte.ts

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* Update tools/server/webui/src/lib/stores/models.svelte.ts

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* webui: extract ModelOption type into dedicated models.d.ts

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* webui: refine ChatMessageAssistant displayedModel source logic

* webui: stabilize dropdown, simplify model extraction, and init assistant model field

* chore: update webui static build

* Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* chore: npm format, update webui static build

* webui: align sidebar trigger position, remove z-index glitch

* chore: update webui build output

---------

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
2025-10-22 16:58:23 +02:00
Ed Addario c81f7cdf9f
Merge branch 'master' into imatrix 2025-10-20 21:00:11 +01:00
Aleksander Grygier c9c1972e2c
Handle legacy 'context' attachments (#16687) 2025-10-20 19:49:02 +02:00
Aleksander Grygier 79068501fa
Prevent premature submission on IME input (#16673)
* fix: Prevent premature submission on IME input

* chore: update webui static build

* refactor: Put IME completion checker in a helper function and add checking for `KeyboardEvent.eventKey === 229`

* chore: update webui static build

* chore: update webui static build

* chore: update webui static build
2025-10-20 14:21:12 +02:00
Aleksander Grygier 0e4a0cf2fa
Import/Export UX improvements (#16619)
* webui : added download action (#13552)

* webui : import and export (for all conversations)

* webui : fixed download-format, import of one conversation

* webui : add ExportedConversations type for chat import/export

* feat: Update naming & order

* chore: Linting

* feat: Import/Export UX improvements

* chore: update webui build output

* feat: Update UI placement of Import/Export tab in Chat Settings Dialog

* refactor: Cleanup

chore: update webui build output

* feat: Enable shift-click multiple conversation items selection

* chore: update webui static build

* chore: update webui static build

---------

Co-authored-by: Sascha Rogmann <github@rogmann.org>
2025-10-20 13:29:14 +02:00
Aleksander Grygier 13f2cfad41
Enable per-conversation loading states to allow having parallel conversations (#16327)
* feat: Per-conversation loading states and tracking streaming stats

* chore: update webui build output

* refactor: Chat state management

Consolidates loading state management by using a global `isLoading` store synchronized with individual conversation states.

This change ensures proper reactivity and avoids potential race conditions when updating the UI based on the loading status of different conversations. It also improves the accuracy of statistics displayed.

Additionally, slots service methods are updated to use conversation IDs for per-conversation state management, avoiding global state pollution.

* feat: Adds loading indicator to conversation items

* chore: update webui build output

* fix: Fix aborting chat streaming

Improves the chat stream abortion process by ensuring that partial responses are saved before the abort signal is sent.

This avoids a race condition where the onError callback could clear the streaming state before the partial response is saved. Additionally, the stream reading loop and callbacks are now checked for abort signals to prevent further processing after abortion.

* refactor: Remove redundant comments

* chore: build webui static output

* refactor: Cleanup

* chore: update webui build output

* chore: update webui build output

* fix: Conversation loading indicator for regenerating messages

* chore: update webui static build

* feat: Improve configuration

* feat: Install `http-server` as dev dependency to not need to rely on `npx` in CI
2025-10-20 12:41:13 +02:00
Radoslav Gerganov 41386cf365
rpc : report actual free memory (#16616)
* rpc : report actual free memory

Start reporting the free memory on every device instead of using
fixed values. Now llama-cli users can get a nice memory breakdown
when using RPC devices.

* drop --mem in rpc-server
2025-10-17 18:02:52 +03:00