Sigbjørn Skjæret
b3e3060f4e
ci : move release details to the top visible by default ( #17719 )
2025-12-03 09:22:46 +01:00
Herman Semenoff
37adc9c6ba
ggml, llama : use defaulted constructors/destructors ( #17649 )
2025-12-03 07:12:18 +01:00
Marcos Del Sol Vives
16cc3c606e
build: document how to compile with Vulkan using Debian/Ubuntu packages ( #17688 )
2025-12-03 08:25:11 +08:00
Xuan-Son Nguyen
13628d8bdb
server: add --media-path for local media files ( #17697 )
...
* server: add --media-path for local media files
* remove unused fn
2025-12-02 22:49:20 +01:00
Xuan-Son Nguyen
a96283adc4
mtmd: fix --no-warmup ( #17695 )
2025-12-02 22:48:08 +01:00
Ali Tariq
4eba8d9451
ci : RVV1.0 builds with tests ( #16682 )
...
* Added RISC-V supported tests
* Added default value for LLAMA_FATAL_WARNINGS and option to specify by user
* Added RISC-V supported tests
* Added default value for LLAMA_FATAL_WARNINGS and option to specify by user
* Removed apt prompt
* Added RISC-V specific tests with corrections
Corrections included:
1. Changed the test names from debian to ubuntu as it is more stable than Debian Trixie
2. Added explicit compiler in cmake command as GCC compiler below version 14 have been recorded
to throw errors with rvv1.0 and some other extensions
3. Added dependencies which are not installed by default in the RISC-V Ubuntu 24.04
4. Separate ccache directory for all jobs as all the ccache results are not the same and may cause ccache to not work
* Resolved the merge conflict and cleaned up run.sh
* Update ci/run.sh
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Removed previously added build ci for RISC-V
* Removed trailing whitespaces
* corrected build name
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* cleanup
* Enabled build tests (1)
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Enabled build tests (2)
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* enable openssl
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-12-02 21:46:10 +01:00
Jeff Bolz
61bde8e21f
vulkan: Reduce temporary memory usage for TOP_K ( #17623 )
...
- Compute row size for the temp buffer based on the output of the first pass.
- Update shader addressing math to use the output row size
- Pass the output row size as "ncols_output", what used to be "ncols_output" is now "k"
For the common case of K=40 and src0=(200000,1,1,1), this reduces the temporary buffer
from about 3.2MB to 500KB.
2025-12-02 19:22:04 +01:00
xiaobing318
e251e5ebbe
cmake : add utf8 compilation options for msvc ( #17682 )
2025-12-02 19:50:57 +02:00
Chad Voegele
c4357dcc35
Server: Change Invalid Schema from Server Error (500) to User Error (400) ( #17572 )
...
* Make invalid schema a user error (400)
* Move invalid_argument exception handler to ex_wrapper
* Fix test
* Simplify test back to original pattern
2025-12-02 17:33:50 +01:00
Adrien Gallouët
e148380c7c
ggml : use svcntb() for SVE vector length detection ( #17474 )
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-12-02 18:21:11 +02:00
TianHao324
a2b0fe8d37
CANN: Disable Ger operator of OUT_PROD on 310p device ( #17563 )
2025-12-02 20:35:23 +08:00
Daniel Bevenius
7f3a72a8ed
ggml : remove redundant n_copies check when setting input/output ( #17612 )
...
This commit removes a redundant check for sched->n_copies > 1 when
setting input and output flags on tensor copies in
ggml_backend_sched_split_graph.
The motivation for this change is to clarify the code as the outer if
statement already performs this check.
2025-12-02 12:52:45 +01:00
Eric Curtin
b9a37717b0
codeowners : remove ericcurtin ( #17658 )
...
Taking a break from llama.cpp . I wasn't around at the start of llama.cpp
but I want to thank @ggerganov and @slaren for creating a neat community
here.
Signed-off-by: Eric Curtin <eric.curtin@docker.com>
2025-12-02 12:18:15 +01:00
Adrien Gallouët
f3a9674ae8
llama : fix signed comparison warning on FreeBSD ( #17497 )
...
This ensures correct RLIM_INFINITY handling and compatibility on all platforms (32/64-bit).
warning: comparison of integers of different signs: 'rlim_t' (aka 'long') and 'size_t' (aka 'unsigned long') [-Wsign-compare]
488 | if (suggest && (lock_limit.rlim_max > lock_limit.rlim_cur + size)) {
| ~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-12-02 12:05:38 +01:00
Xuan-Son Nguyen
2c453c6c77
convert: add error message for mistral3 quantized weight ( #17686 )
2025-12-02 11:48:31 +01:00
Xuan-Son Nguyen
5d6bd842ea
server: remove default "gpt-3.5-turbo" model name ( #17668 )
...
* server: remove default "gpt-3.5-turbo" model name
* do not reflect back model name from request
* fix test
2025-12-02 11:38:57 +01:00
senhtry
fd3abe849e
server: fixing naming conflict res_error in server-models.cpp ( #17679 )
2025-12-02 11:18:39 +01:00
Xuan-Son Nguyen
682e6658bb
server: explicitly set exec path when create new instance ( #17669 )
...
* Revert "rm unused fn"
This reverts commit f2dbe9c087 .
* server: explicitly set exec path when create new instance
* put back TODO
* only call get_server_exec_path() once
* add fallback logic
2025-12-02 10:25:11 +01:00
Adrien Gallouët
4574f2949e
ci : skip winget update when not in ggml-org ( #17465 )
...
Prevent forks from generating daily failure notifications.
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-12-02 10:15:01 +01:00
Adrien Gallouët
ab6726eeff
ggml : add fallback definition for HWCAP2_SVE2 ( #17683 )
...
This align with other HWCAP2 feature flags
See #17528
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-12-02 10:41:26 +02:00
Aleksander Grygier
cee92af553
Add context info to server error ( #17663 )
...
* fix: Add context info to server error
* chore: update webui build output
2025-12-02 09:20:57 +01:00
Aman Gupta
ed32089927
ggml-cuda: reorder only relevant nodes ( #17639 )
2025-12-02 12:36:31 +08:00
Aaron Teo
7b6d745364
release: fix duplicate libs, store symbolic links ( #17299 )
2025-12-02 11:52:05 +08:00
Neo Zhang Jianyu
98bd9ab1e4
enhance argsort for UT ( #17573 )
...
Co-authored-by: Neo Zhang <zhang.jianyu@outlook.com>
2025-12-02 08:56:46 +08:00
Piotr Wilkin (ilintar)
746f9ee889
Override SSM_A op for Qwen3 Next to reduce splits ( #17587 )
...
* Override SSM_A op for Qwen3 Next to reduce splits
* New tensor mapping SSM_A_NOSCAN for SSM_A used outside of OP_SSM_SCAN context.
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-12-02 00:43:13 +01:00
Jeff Bolz
9810cb8247
ops.md: update vulkan support ( #17661 )
2025-12-01 15:26:21 -06:00
Xuan-Son Nguyen
ecf74a8417
mtmd: add mtmd_context_params::warmup option ( #17652 )
...
* mtmd: add mtmd_context_params::warmup option
* reuse the common_params::warmup
2025-12-01 21:32:25 +01:00
Gilad S.
00c361fe53
fix: llama arch implementation ( #17665 )
2025-12-01 21:21:13 +01:00
Xuan-Son Nguyen
ec18edfcba
server: introduce API for serving / loading / unloading multiple models ( #17470 )
...
* server: add model management and proxy
* fix compile error
* does this fix windows?
* fix windows build
* use subprocess.h, better logging
* add test
* fix windows
* feat: Model/Router server architecture WIP
* more stable
* fix unsafe pointer
* also allow terminate loading model
* add is_active()
* refactor: Architecture improvements
* tmp apply upstream fix
* address most problems
* address thread safety issue
* address review comment
* add docs (first version)
* address review comment
* feat: Improved UX for model information, modality interactions etc
* chore: update webui build output
* refactor: Use only the message data `model` property for displaying model used info
* chore: update webui build output
* add --models-dir param
* feat: New Model Selection UX WIP
* chore: update webui build output
* feat: Add auto-mic setting
* feat: Attachments UX improvements
* implement LRU
* remove default model path
* better --models-dir
* add env for args
* address review comments
* fix compile
* refactor: Chat Form Submit component
* ad endpoint docs
* Merge remote-tracking branch 'webui/allozaur/server_model_management_v1_2' into xsn/server_model_maagement_v1_2
Co-authored-by: Aleksander <aleksander.grygier@gmail.com>
* feat: Add copy to clipboard to model name in model info dialog
* feat: Model unavailable UI state for model selector
* feat: Chat Form Actions UI logic improvements
* feat: Auto-select model from last assistant response
* chore: update webui build output
* expose args and exit_code in API
* add note
* support extra_args on loading model
* allow reusing args if auto_load
* typo docs
* oai-compat /models endpoint
* cleaner
* address review comments
* feat: Use `model` property for displaying the `repo/model-name` naming format
* refactor: Attachments data
* chore: update webui build output
* refactor: Enum imports
* feat: Improve Model Selector responsiveness
* chore: update webui build output
* refactor: Cleanup
* refactor: Cleanup
* refactor: Formatters
* chore: update webui build output
* refactor: Copy To Clipboard Icon component
* chore: update webui build output
* refactor: Cleanup
* chore: update webui build output
* refactor: UI badges
* chore: update webui build output
* refactor: Cleanup
* refactor: Cleanup
* chore: update webui build output
* add --models-allow-extra-args for security
* nits
* add stdin_file
* fix merge
* fix: Retrieve lost setting after resolving merge conflict
* refactor: DatabaseStore -> DatabaseService
* refactor: Database, Conversations & Chat services + stores architecture improvements (WIP)
* refactor: Remove redundant settings
* refactor: Multi-model business logic WIP
* chore: update webui build output
* feat: Switching models logic for ChatForm or when regenerating messges + modality detection logic
* chore: update webui build output
* fix: Add `untrack` inside chat processing info data logic to prevent infinite effect
* fix: Regenerate
* feat: Remove redundant settigns + rearrange
* fix: Audio attachments
* refactor: Icons
* chore: update webui build output
* feat: Model management and selection features WIP
* chore: update webui build output
* refactor: Improve server properties management
* refactor: Icons
* chore: update webui build output
* feat: Improve model loading/unloading status updates
* chore: update webui build output
* refactor: Improve API header management via utility functions
* remove support for extra args
* set hf_repo/docker_repo as model alias when posible
* refactor: Remove ConversationsService
* refactor: Chat requests abort handling
* refactor: Server store
* tmp webui build
* refactor: Model modality handling
* chore: update webui build output
* refactor: Processing state reactivity
* fix: UI
* refactor: Services/Stores syntax + logic improvements
Refactors components to access stores directly instead of using exported getter functions.
This change centralizes store access and logic, simplifying component code and improving maintainability by reducing the number of exported functions and promoting direct store interaction.
Removes exported getter functions from `chat.svelte.ts`, `conversations.svelte.ts`, `models.svelte.ts` and `settings.svelte.ts`.
* refactor: Architecture cleanup
* feat: Improve statistic badges
* feat: Condition available models based on modality + better model loading strategy & UX
* docs: Architecture documentation
* feat: Update logic for PDF as Image
* add TODO for http client
* refactor: Enhance model info and attachment handling
* chore: update webui build output
* refactor: Components naming
* chore: update webui build output
* refactor: Cleanup
* refactor: DRY `getAttachmentDisplayItems` function + fix UI
* chore: update webui build output
* fix: Modality detection improvement for text-based PDF attachments
* refactor: Cleanup
* docs: Add info comment
* refactor: Cleanup
* re
* refactor: Cleanup
* refactor: Cleanup
* feat: Attachment logic & UI improvements
* refactor: Constants
* feat: Improve UI sidebar background color
* chore: update webui build output
* refactor: Utils imports + move types to `app.d.ts`
* test: Fix Storybook mocks
* chore: update webui build output
* test: Update Chat Form UI tests
* refactor: Tooltip Provider from core layout
* refactor: Tests to separate location
* decouple server_models from server_routes
* test: Move demo test to tests/server
* refactor: Remove redundant method
* chore: update webui build output
* also route anthropic endpoints
* fix duplicated arg
* fix invalid ptr to shutdown_handler
* server : minor
* rm unused fn
* add ?autoload=true|false query param
* refactor: Remove redundant code
* docs: Update README documentations + architecture & data flow diagrams
* fix: Disable autoload on calling server props for the model
* chore: update webui build output
* fix ubuntu build
* fix: Model status reactivity
* fix: Modality detection for MODEL mode
* chore: update webui build output
---------
Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-12-01 19:41:04 +01:00
Xuan-Son Nguyen
7733409734
common: improve verbosity level definitions ( #17630 )
...
* common: improve verbosity level definitions
* string_format
* update autogen docs
2025-12-01 14:38:13 +01:00
Xuan-Son Nguyen
cd3c118908
model: support Ministral3 ( #17644 )
...
* conversion script
* support ministral 3
* maybe this is better?
* add TODO for rope_yarn_log_mul
* better ppl (tested on 14B-Instruct)
* Add Ministral3 support to Mistral format
* improve arch handling
* add sizes
* Apply suggestions from code review
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* nits
---------
Co-authored-by: Julien Denize <julien.denize@mistral.ai>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-12-01 12:26:52 +01:00
Georgi Gerganov
649495c9d9
metal : add FA head size 48 ( #17619 )
2025-12-01 12:49:53 +02:00
Georgi Gerganov
90c72a614a
ggml : extend the GGML_SCHED_NO_REALLOC debug logic of the scheduler ( #17617 )
2025-12-01 12:49:33 +02:00
Aman Gupta
6eea666912
llama-graph: avoid expand_forward for fusion ( #17633 )
2025-12-01 11:12:48 +02:00
Xuan-Son Nguyen
ff90508d68
contributing: update guidelines for AI-generated code ( #17625 )
...
* contributing: update guidelines for AI-generated code
* revise
2025-11-30 22:51:34 +01:00
Adrien Gallouët
0a4aeb927d
cmake : add option to build and link LibreSSL ( #17552 )
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-11-30 22:14:32 +01:00
Tarek Dakhran
2ba719519d
model: LFM2-VL fixes ( #17577 )
...
* Adjust to pytorch
* Add antialiasing upscale
* Increase number of patches to 1024
* Handle default marker insertion for LFM2
* Switch to flag
* Reformat
* Cuda implementation of antialias kernel
* Change placement in ops.cpp
* consistent float literals
* Pad only for LFM2
* Address PR feedback
* Rollback default marker placement changes
* Fallback to CPU implementation for antialias implementation of upscale
2025-11-30 21:57:31 +01:00
Xuan-Son Nguyen
7f8ef50cce
clip: fix nb calculation for qwen3-vl ( #17594 )
2025-11-30 15:33:55 +01:00
Xuan-Son Nguyen
3c136b21a3
cli: add migration warning ( #17620 )
2025-11-30 15:32:43 +01:00
Adrien Gallouët
beb1f0c503
common : throttle download progress output to reduce IO flush ( #17427 )
...
This change limits progress updates to approximately every 0.1% of the
file size to minimize stdio overhead.
Also fixes compiler warnings regarding __func__ in lambdas.
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-11-30 14:22:44 +02:00
Aaron Teo
def5404f26
common: add LLAMA_LOG_FILE env var ( #17609 )
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-11-30 12:12:32 +01:00
Gilad S.
fa0465954f
ggml: fix: macOS build with `-DGGML_BACKEND_DL=ON` ( #17581 )
2025-11-30 10:00:59 +08:00
ddh0
5a6241feb0
common: update env var name ( #17588 )
2025-11-30 09:59:25 +08:00
Aman Gupta
c7af376c29
CUDA: add stream-based concurrency ( #16991 )
...
* CUDA: add stream-based concurrency
* HIP: fix hipStreamWaitEvent define and nodiscard warnings
* ggml-cuda: fix fusion inside stream
* ggml-cuda: fix bug w.r.t first stream launch
* ggml-cuda: format
* ggml-cuda: improve assert message
* ggml-cuda: use lambda instead of duplicating code
* ggml-cuda: add some more comments
* ggml-cuda: add more detailed comments about concurrency
* ggml-cuda: rename + remove unused var
* ggml-cuda: fix condition for stream launch
* ggml-cuda: address review comments, add destructor
* common.cuh: add is_valid for concurrent events
* common.cuh: make comment better
* update comment
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* update comment
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* common.cuh: fix lower_bound condition + remove join_node data from write_ranges
* ggml-cuda: fix overlap condition + shadowing parameter
---------
Co-authored-by: Carl Philipp Klemm <carl@uvos.xyz>
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2025-11-30 08:17:55 +08:00
Mahekk Shaikh
00425e2ed1
cuda : add error checking for cudaMemcpyAsync in argsort ( #17599 )
...
* cuda : add error checking for cudaMemcpyAsync in argsort (#12836 )
* fix indentation
2025-11-30 08:16:28 +08:00
Acly
385c3da5e6
vulkan : fix FA mask load with bounds check (coopmat2) ( #17606 )
2025-11-30 01:03:21 +01:00
Xuan-Son Nguyen
ab49f094d2
server: move server-context to its own cpp|h ( #17595 )
...
* git mv
* add server-context.h
* add server-context.h
* clean up headers
* cont : cleanup
* also expose server_response_reader (to be used by CLI)
* fix windows build
* decouple server_routes and server_http
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-11-29 22:04:44 +01:00
Haiyue Wang
8c32d9d96d
server: explicitly set the function name in lambda ( #17538 )
...
As [1] explained, the real debug message will be like:
"res operator(): operator() : queue result stop"
Set the name explicitly, the message is easy for debugging:
"res operator(): recv : queue result stop"
The left "operator()" is generated by 'RES_DBG() ... __func__'
[1]: https://clang.llvm.org/extra/clang-tidy/checks/bugprone/lambda-function-name.html
Signed-off-by: Haiyue Wang <haiyuewa@163.com>
2025-11-29 18:43:29 +01:00
Igor Smirnov
0874693b44
common : fix json schema with '\' in literals ( #17307 )
...
* Fix json schema with '\' in literals
* Add "literal string with escapes" test
2025-11-29 17:06:32 +01:00
Neo Zhang
7d2add51d8
sycl : support to malloc memory on device more than 4GB, update the doc and script ( #17566 )
...
Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>
2025-11-29 14:59:44 +02:00