Commit Graph

6660 Commits

Author SHA1 Message Date
Adrien Gallouët 4201deae9c
common: introduce http.h for httplib-based client (#16373)
* common: introduce http.h for httplib-based client

This change moves cpp-httplib based URL parsing and client setup into
a new header `common/http.h`, and integrates it in `arg.cpp` and `run.cpp`.

It is an iteration towards removing libcurl, while intentionally
minimizing changes to existing code to guarantee the same behavior when
`LLAMA_CURL` is used.

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* tools : add missing WIN32_LEAN_AND_MEAN

Signed-off-by: Adrien Gallouët <adrien@gallouet.fr>

---------

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
Signed-off-by: Adrien Gallouët <adrien@gallouet.fr>
2025-10-01 20:22:18 +03:00
Aleksander Grygier 764799279f
Conversation action dialogs as singletons from Chat Sidebar + apply conditional rendering for Actions Dropdown for Chat Conversation Items (#16369)
* fix: Render Conversation action dialogs as singletons from Chat Sidebar level

* chore: update webui build output

* fix: Render Actions Dropdown conditionally only when user hovers conversation item + remove unused markup

* chore: Update webui static build

* fix: Always truncate conversation names

* chore: Update webui static build
2025-10-01 18:18:10 +02:00
Aleksander Grygier 2a9b63383a
Improve code block color theming (#16325)
* feat: Improve code block theming

* chore: update webui build output

* chore: Update webui static build
2025-10-01 15:54:42 +02:00
Sigbjørn Skjæret 1104ca1a1c
ci : use registry cache for docker builds (#16366) 2025-10-01 14:09:52 +02:00
Aleksander Grygier 4f1575921c
Add optional setting for showing "Model used:" information (#16337)
* feat: Add a setting to include model name used to generate the message

* feat: UI improvements

* feat: Save model info along with the database message entry creation

* chore: Build webui static output
2025-10-01 12:08:16 +02:00
Eve 132d673554
vulkan: make ggml_vk_default_dispatcher support older vulkan headers (#16345)
* make ggml_vk_default_dispatcher support older vulkan headers

* simpilfy with using
2025-10-01 09:56:36 +02:00
Aleksander Grygier aa9538a63a
webui: Remove running `llama-server` within WebUI `dev.sh` script (#16363) 2025-10-01 08:40:26 +03:00
Bartowski e74c92e842
model : support GLM 4.6 (make a few NextN/MTP tensors not required) (#16359)
* Make a few GLM tensors not required

layer.nextn.shared_head_head and layer.nextn.embed_tokens are both excluded from GLM 4.6 resulting in the model not loading after conversion/quantization, this marks those tensors as not required which makes it work

* Update llama-model.cpp

layer.nextn.shared_head_norm also not required in case of future models
2025-09-30 22:24:36 +02:00
Sigbjørn Skjæret b2ba81dbe0
ci : fix ccache key for ubuntu-cpu-cmake (#16355)
* fix ccache key for ubuntu-cpu-cmake

* set it for release as well [no ci]
2025-09-30 21:41:42 +02:00
Adrien Gallouët bf6f3b3a19
common : disable progress bar without a tty (#16352)
* common : disable progress bar without a tty

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Add missing headers

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

---------

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-09-30 20:52:41 +03:00
lhez 7c156df414
opencl: support pad_ext (#15888) 2025-09-30 10:45:45 -07:00
Pascal 16b0ca0d2e
Chatapi ignore empty sampling (#16330)
* fix: skip empty sampling fields instead of coercing to 0 in chat API options

* chore: update webui build output
2025-09-30 19:18:54 +02:00
Reese Levine 8d78cd2613
ggml webgpu: support for rope,div,sub,glu,scale,cont operators (#16187)
* Work on rope

* Simplify inplace operation generation and combine mul/add generation

* Work on rope variants

* implement neox rope

* rope complete

* Add sub,div,glu operators

* implement scale op

* Update cpy shader to handle cont/more types

* formatting

* Update test vars printing for rope,rms_norm

* Avoid ROPE hardcoded constants

* Add TODO to change ROPE constants to enum

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* fix TODO comment

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-09-30 09:57:51 -07:00
lhez d1c84a662d
opencl: support ne3 in get_rows (#15866) 2025-09-30 09:55:13 -07:00
Adrien Gallouët 364a7a6d4a
common : remove common_has_curl() (#16351)
`test-arg-parser.cpp` has been updated to work consistently,
regardless of whether CURL or SSL support is available, and
now always points to `ggml.ai`.

The previous timeout test has been removed, but it can be
added back by providing a dedicated URL under `ggml.ai`.

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-09-30 17:39:44 +03:00
Sigbjørn Skjæret 2df5bcf357
ci : disable ccache for android (#16348) 2025-09-30 15:38:01 +02:00
Georgi Gerganov 075c01567b ggml : bump version to 0.9.4 (ggml/1363) 2025-09-30 13:53:55 +03:00
anavp-nvidia a014310374
cuda : Enable CUDA Graph usage for Nemotron Nano v2 (NemotronH) (#16328)
* Fix Nemotron Nano v2 9B not executing as CUDA Graph on NVIDIA GPUs

* fix to ensure test-backend-ops check passes
2025-09-30 11:13:22 +03:00
Georgi Gerganov 35fb82497e
metal : dynamic simdgroups for MV kernels (#16340)
* metal : dynamic simdgroups for MV kernels

* cont : minor
2025-09-30 11:03:23 +03:00
Adrien Gallouët 3c62aed89f
common : simplify etag tracking by removing json (#16342)
The JSON parser is temporarily kept only for backward compatibility. It
reads the etag from old .json files to prevent unnecessary re-downloads
for existing users.

This legacy code can be removed in a future version.

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-09-30 10:36:33 +03:00
Charles Xu f1eb1cb1eb
kleidiai : fix work size and threads sync for fp16 (#16246) 2025-09-30 10:07:20 +03:00
lhez de41f2b7bf
codeowners: add codeowners for opencl backend (#16344) 2025-09-30 08:30:16 +03:00
Jeff Bolz a74a0d69f3
tests: override test_set_rows::max_nmse_err to allow for occasional rounding differences (#16295)
* tests: override test_set_rows::max_nmse_err to allow for occasional rounding differences

* apply similar error bounds to test_cpy
2025-09-29 19:26:34 -05:00
Pascal 5f7e166cbf
Fix thinking blocks with quotes + add handling `[THINK]...[/THINK]` blocks (#16326)
* fix: prevent reasoning blocks with quotes from being truncated

* chore: update webui build output

* feat: Improve thinking content parsing

* test: Adds ChatMessage component stories for different thinking blocks

* chore: update webui build output

* fix: ChatMessage story fix

---------

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
2025-09-29 18:49:47 +02:00
Georgi Gerganov d72f5f7ba2
ci : add AMD runners and workflows (#16249)
* ci : add AMD runners and workflows

* ci : move AMD jobs to separate workflow

* cont : fix paths
2025-09-29 17:51:48 +03:00
alex-spacemit b77e6c18e1
ggml: riscv: add riscv spacemit backend (#15288)
* ggml: add spacemit backend

Change-Id: I249bdc043485d815a9c351867137bc1e27cc2e23

* add new line at end of file

Change-Id: I889ed1c85fb45e62350ecde0c06f70450cadfbe2

* add riscv zba extension limit

Change-Id: I321eb200f859751727afe5cae13074dfce2bb0ce

* fixed for review comments, file renamed and format

Change-Id: Ia20b6ec24a36638e62e0fe07cf100916a7cce3ce

* fixed for code format, after clang-format

Change-Id: I5dc33a0412da3d3f2d77075d8939185d3009eca2

* use _Float16 instead of __fp16

Change-Id: I039fb02bb95270e641bc4442204e658735859d43

* add ci for riscv64-spacemit-ime-native

Change-Id: I711c1033061df1a289ea77891b2997599dfe8279

* update debian-13-riscv64-spacemit-ime-native ci label

Change-Id: Ifb2b891e2fca57b5da604fce2ac255f27731179a

* remove license comment for spacemit ime

Change-Id: If0dc3ca30a958631ccca0a28b62e0b825f9fb0c3

* upgrade binutils for gcc ime

Change-Id: Ibf2fa74c1064408974cb5b45f044d40987e5fb45

* add spacemit ime cross jobs

Change-Id: I80d74909941d41cb9cd09e51d8baf01c985cbfc6

* remove native compile for riscv64-spacemit-ime

Change-Id: I01920afafdc73fa7424014fd648d243f8ec9e25e

* ci : add caching for spacemit ime cross toolchain

Change-Id: Ic54a192019a2fd982bbd58225ce3bbc38f4053de

* ci: bug fixed for cache path and env

Change-Id: I28c42e10b6fff053bb6580926ca2353448cb042a

* Update .github/workflows/build-linux-cross.yml for cache path

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* bugfixed for  build-linux-cross.yml,  syntax error

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: cailinxi <linxi.cai@spacemit.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-09-29 17:50:44 +03:00
Georgi Gerganov 2ddd3f2356 sync : ggml 2025-09-29 17:43:58 +03:00
Georgi Gerganov 4d3d455d3c sync : whisper.cpp (ggml/1359)
* ggml : Fix MKL detection by quoting BLAS_INCLUDE_DIRS (whisper/3426)

* sync : whisper.cpp
2025-09-29 17:43:58 +03:00
Daniel Bevenius c9b1c06467 ggml : remove -dev suffix from release version (ggml/1355)
This commit removes the `-dev` suffix from the version string in
CMakeLists.txt and the release script. The version will now be
just be formatted as `MAJOR.MINOR.PATCH`.
2025-09-29 17:43:58 +03:00
Daniel Bevenius b6ae75afb4 ggml : bump version to 0.9.3 (ggml/1353) 2025-09-29 17:43:58 +03:00
Georgi Gerganov b6dff20e2f ggml : prepare for development of 0.9.2-dev 2025-09-29 17:43:58 +03:00
Georgi Gerganov 2db78c75e4 ggml : bump version to 0.9.1 2025-09-29 17:43:58 +03:00
Rafal Lewczuk 02463ab27b
ggml-backend : add root cause in error message if loading backend library fails (#16172)
This PR adds additional information to an error message when loading backend library via ld_load_library() fails. This helps spotting why backend library did not load (missing library, missing dependency or unresolved symbol etc.).
2025-09-29 13:17:09 +02:00
Sigbjørn Skjæret adc76347d7
ggml : check cuda and metal argsort limits and add test (#16323)
* check cuda argsort limits and add test

* add metal check
2025-09-29 11:09:00 +02:00
Aleksander Grygier 3a2bdcda0b
Improve Mobile UI for dialogs and action dropdowns (#16222)
* fix: Always show conversation item actions

* feat: Improve Alert Dialog and Dialog mobile UI

* feat: Add settings reset to default confirmation

* fix: Close Edit dialog on save

* chore: update webui build output

* webui: implement proper z-index system and scroll management

- Add CSS variable for centralized z-index control
- Fix dropdown positioning with Settings dialog conflicts
- Prevent external scroll interference with proper event handling
- Clean up hardcoded z-index values for maintainable architecture

* webui: ensured the settings dialog enforces dynamic viewport height on mobile while retaining existing desktop sizing overrides

* feat: Use `dvh` instead of computed px height for dialogs max height on mobile

* chore: update webui build output

* feat: Improve Settings fields UI

* chore: update webui build output

* chore: update webui build output

---------

Co-authored-by: Pascal <admin@serveurperso.com>
2025-09-29 10:37:20 +02:00
Pascal 66bb7985c3
fix: preserved zero values in chat settings inputs and textareas by switching to nullish coalescing for field values and default placeholders (#16312) 2025-09-29 09:08:41 +02:00
Vinkal 2f61c0f5bf
llama-cli: prevent spurious assistant token (#16202)
* tools/main: llama-cli: prevent spurious assistant token (#13402)

During prompt ingestion, prompt tokens are accepted into the sampler history (for repetition penalties). The conversation-mode path then appended `common_sampler_last(smpl)` to `assistant_ss` before any new token was sampled. At that point, "last" was a prompt-side token (e.g., an input prefix), so the assistant chat message began with an extra piece.

Fix: append to `assistant_ss` only for a newly sampled (non-EOG) token. This affects only chat message assembly (`assistant_ss` / `chat_msgs` / `common_chat_format_single`); terminal stdout is unchanged. Sampling order/logits are unchanged.

Fixes #13402.

Signed-off-by: Vinkal Chudgar <vinkal.chudgar@gmail.com>

* Update tools/main/main.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* tools/main: remove outdated comment

Signed-off-by: Vinkal Chudgar <vinkal.chudgar@gmail.com>

---------

Signed-off-by: Vinkal Chudgar <vinkal.chudgar@gmail.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-09-29 10:03:12 +03:00
ddh0 3ffd0fae47
perplexity : show more kl-divergence data (#16321)
Adds additional percentile data for displayed in the output of `llama-perplexity --kl-divergence`:
- Added 95 percentile (mirroring existing 5 percentile)
- Added 0.1 percentile (mirroring existing 99.9 percentile)
2025-09-29 09:30:45 +03:00
Georgi Gerganov a4a0aa5ea2
ggml : fix dependencies for ggml_set_rows (#16318) 2025-09-29 08:41:28 +03:00
Jeff Bolz 92cd103f62
vulkan: Fix validation failure in quantized flash attention (#16292) 2025-09-29 06:50:37 +02:00
Sigbjørn Skjæret b887d2f341
ggml : fix GGML_F32_VEC_FMA argument order in ggml_vec_mad1_f32 (#16307)
* fix GGML_F32_VEC_FMA argument order in ggml_vec_mad1_f32

* add test that fails on simd
2025-09-28 23:15:03 +02:00
crat0z bd0af02fc9
common : fix reasoning before forced tool call via tool_choice = required (#16264)
* common : fix reasoning before forced tool call via tool_choice = required

* common : improve reasoning and commentary handling when tool_choice is required

(cherry picked from commit c746984956d6882c2de73d53ae2bb3bdf889e475)

---------

Co-authored-by: Alde Rojas <hello@alde.dev>
2025-09-28 21:13:50 +03:00
R0CKSTAR d9e0e7c819
ci : fix musa docker build (#16306)
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>
2025-09-28 16:38:15 +02:00
Aaron Teo 0124ac989f
devops: switch to using ubuntu-22.04-s390x image (#16302)
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-09-28 19:25:58 +08:00
Imad Saddik 2811c65286
Fixed a few typos in the README of the LLaMA.cpp HTTP Server [no ci] (#16297) 2025-09-28 13:04:46 +02:00
Jeff Bolz d8359f5fde
vulkan: 64-bit im2col (#16135)
* vulkan: 64-bit im2col

Add variants of the im2col shaders that use buffer_device_address/buffer_reference,
and use 64-bit address calculations. This is needed for large convolutions used in
stable-diffusion.cpp.

* fix validation error for large im2col
2025-09-28 08:38:37 +02:00
Georgi Gerganov 6a2c6145a0
metal : extend mat-mat multiplication support (#16225)
* metal : support mul_mm with src1->type == GGML_TYPE_F16

* metal : support mul_mm_id with src1->type == GGML_TYPE_F16

[no ci]

* metal : mul_mm support ne00 % 32 != 0

* metal : support mul_mm_id with ne00 % 32 != 0

* cont : remove unnecessary unrolls

* cont : simplify data loading

* metal : optimize mul_mm when output bounds checks are not needed
2025-09-28 09:34:44 +03:00
Georgi Gerganov 3b53634fe3
metal : fuse non-sequential nodes (#16102)
* metal : fuse non-sequential nodes

* cont : add comment

* cont : simplify bounds checks
2025-09-28 09:34:05 +03:00
Jeff Bolz 1384abf8b8
vulkan: handle mat_mul with A matrix > 4GB (#16176)
* vulkan: handle mat_mul with A matrix > 4GB

This change splits mat_mul operations with huge A matrix into chunks in the M
dimension. This works well for stable-diffusion use cases where the im2col
matrix has very large M.

Fix the order of setting the stride in mul_mm_cm2 - setting the dimension
clobbers the stride, so stride should be set after.

* build fixes
2025-09-27 20:36:34 -05:00
Jeff Bolz e6d65fb02d
vulkan: support arbitrary KV dimension in flash attention (#16160)
The "Clamp" spec constant is already based on whether KV is a multiple of Bc,
so use that to control whether bounds checking is performed. Add bounds checking
to the scalar and coopmat1 paths. Coopmat2 didn't need any changes (the K/V
tensors are already optionally clamped, nothing else needed to be changed).
2025-09-27 22:43:39 +02:00