Commit Graph

7360 Commits

Author SHA1 Message Date
Xuan Son Nguyen f2dbe9c087 rm unused fn 2025-12-01 14:20:43 +01:00
Georgi Gerganov d182544c99
server : minor 2025-12-01 13:35:40 +02:00
Xuan Son Nguyen 4a1c05c383 fix invalid ptr to shutdown_handler 2025-11-30 15:31:05 +01:00
Xuan Son Nguyen 7b28b5e16a fix duplicated arg 2025-11-30 14:53:47 +01:00
Xuan Son Nguyen 802e77eaf4 Merge remote-tracking branch 'webui/allozaur/server_model_management_v1_2' into xsn/server_model_management_v1_2 2025-11-29 23:54:34 +01:00
Xuan Son Nguyen 23cb411317 also route anthropic endpoints 2025-11-29 23:29:06 +01:00
Aleksander Grygier e8b9d74b3b chore: update webui build output 2025-11-29 23:18:45 +01:00
Aleksander Grygier acd3c58152 refactor: Remove redundant method 2025-11-29 23:18:24 +01:00
Aleksander Grygier 360a5ed62b test: Move demo test to tests/server 2025-11-29 23:17:34 +01:00
Xuan Son Nguyen a82dbbfb30 decouple server_models from server_routes 2025-11-29 23:00:35 +01:00
Xuan Son Nguyen c1dfccd078 Merge branch 'master' into xsn/server_model_management_v1_2 2025-11-29 22:34:16 +01:00
Xuan-Son Nguyen ab49f094d2
server: move server-context to its own cpp|h (#17595)
* git mv

* add server-context.h

* add server-context.h

* clean up headers

* cont : cleanup

* also expose server_response_reader (to be used by CLI)

* fix windows build

* decouple server_routes and server_http

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-11-29 22:04:44 +01:00
Aleksander Grygier 6fd720e742 Merge remote-tracking branch 'origin/allozaur/server_model_management_v1_2' into allozaur/server_model_management_v1_2 2025-11-29 21:59:33 +01:00
Aleksander Grygier ae8a1e8137 refactor: Tests to separate location 2025-11-29 21:44:57 +01:00
Aleksander Grygier 949b5fd63e refactor: Tooltip Provider from core layout 2025-11-29 21:41:36 +01:00
Aleksander Grygier 4f39da823f test: Update Chat Form UI tests 2025-11-29 20:13:11 +01:00
Aleksander Grygier 33b9cc40a1
Merge branch 'master' into allozaur/server_model_management_v1_2 2025-11-29 19:40:46 +01:00
Haiyue Wang 8c32d9d96d
server: explicitly set the function name in lambda (#17538)
As [1] explained, the real debug message will be like:
	"res    operator(): operator() : queue result stop"

Set the name explicitly, the message is easy for debugging:
	"res    operator(): recv : queue result stop"

The left "operator()" is generated by 'RES_DBG() ... __func__'

[1]: https://clang.llvm.org/extra/clang-tidy/checks/bugprone/lambda-function-name.html

Signed-off-by: Haiyue Wang <haiyuewa@163.com>
2025-11-29 18:43:29 +01:00
Igor Smirnov 0874693b44
common : fix json schema with '\' in literals (#17307)
* Fix json schema with '\' in literals

* Add "literal string with escapes" test
2025-11-29 17:06:32 +01:00
Neo Zhang 7d2add51d8
sycl : support to malloc memory on device more than 4GB, update the doc and script (#17566)
Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>
2025-11-29 14:59:44 +02:00
ixgbe f698a79c63
ggml: replace hwcap with riscv_hwprobe for RVV detection (#17567)
Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>
2025-11-29 14:56:31 +02:00
Ruben Ortlam 47a268ea50
Vulkan: MMVQ Integer Dot K-Quant and MUL_MAT_ID support (#16900)
* vulkan: split mul_mmq_funcs for mul_mat_vecq use

* add mxfp4 mmvq

* add q2_k mmvq

* add q3_k mmvq

* add q4_k and q5_k mmvq

* add q6_k mmvq

* handle 4x4 quants per mmvq thread

* enable MUL_MAT_ID mmvq support

* enable subgroup optimizations for mul_mat_vec_id shaders

* device tuning

* request prealloc_y sync after quantization

* fix indentation

* fix llvmpipe test failures

* fix mul_mat_id mmvq condition

* fix unused variable warning
2025-11-29 09:37:22 +01:00
Jeff Bolz 59d8d4e963
vulkan: improve topk perf for large k, fix overflow in unit tests (#17582) 2025-11-29 08:39:57 +01:00
Aleksander Grygier a568e74c20 chore: update webui build output 2025-11-29 02:40:09 +01:00
Aleksander Grygier 2d556bb93c test: Fix Storybook mocks 2025-11-29 02:36:41 +01:00
Aleksander Grygier 493ef08723 refactor: Utils imports + move types to `app.d.ts` 2025-11-29 02:33:37 +01:00
Aleksander Grygier ce9c9afe0d chore: update webui build output 2025-11-29 01:40:00 +01:00
Aleksander Grygier 2464e06028 feat: Improve UI sidebar background color 2025-11-29 01:39:40 +01:00
Aleksander Grygier 27b152267f refactor: Constants 2025-11-29 01:38:02 +01:00
Aleksander Grygier 648d2deebc feat: Attachment logic & UI improvements 2025-11-29 01:36:05 +01:00
Aleksander Grygier d49d97c642
refactor: Cleanup 2025-11-29 00:51:18 +01:00
Aleksander Grygier f50ce7b5b4
refactor: Cleanup 2025-11-29 00:50:16 +01:00
Aleksander Grygier 4d16459b4c
re 2025-11-29 00:49:46 +01:00
Aleksander Grygier c76de5e0ad
refactor: Cleanup 2025-11-29 00:49:20 +01:00
Aleksander Grygier 2f97dbfa65
docs: Add info comment 2025-11-29 00:49:03 +01:00
Aleksei Nikiforov d82b7a7c1d
gguf-py : fix passing non-native endian tensors (editor-gui and new-metadata) (#17553)
gguf_new_metadata.py reads data from reader.
Reader doesn't byteswap tensors to native endianness.
But writer does expect tensors in native endianness to convert them
into requested endianness.

There are two ways to fix this: update reader and do conversion to native endianness and back,
or skip converting endianness in writer in this particular USE-case.

gguf_editor_gui.py doesn't allow editing or viewing tensor data.
Let's go with skipping excessive byteswapping.

If eventually capability to view or edit tensor data is added,
tensor data should be instead byteswapped when reading it.
2025-11-28 20:53:01 +01:00
Aleksander Grygier 1adf173dd6 refactor: Cleanup 2025-11-28 19:36:03 +01:00
Aleksander Grygier dd30810d0a fix: Modality detection improvement for text-based PDF attachments 2025-11-28 19:30:32 +01:00
DAN™ 03914c7ef8
common : move all common_chat_parse_* to chat-parser.cpp. (#17481) 2025-11-28 19:29:36 +01:00
o7si 3ce7a65c2f
server: fix: /metrics endpoint returning JSON-escaped Prometheus format (#17386)
* fix: /metrics endpoint returning JSON-escaped Prometheus format

* mod: remove string overload from ok() method
2025-11-28 19:14:00 +01:00
Diego Devesa e072b2052e
ggml : add GGML_SCHED_NO_REALLOC option to disable reallocations in ggml_backend_sched (#17276)
* ggml : add GGML_SCHED_NO_REALLOC option to disable reallocations in ggml_backend_sched
Enabled in ggml-ci for testing.

* llama : update worst-case graph for unified cache

* ci : disable op offload in some tests

* fix spelling

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-11-28 17:33:23 +02:00
Aleksander Grygier 171a0926a1 chore: update webui build output 2025-11-28 16:00:44 +01:00
Aleksander Grygier 68b653ef45 refactor: DRY `getAttachmentDisplayItems` function + fix UI 2025-11-28 15:58:52 +01:00
Aleksander Grygier 1cf5daa8c0 refactor: Cleanup 2025-11-28 15:56:41 +01:00
Aleksander Grygier 04ef4a06e2 chore: update webui build output 2025-11-28 15:44:43 +01:00
Aleksander Grygier 5fadd0fe18 refactor: Components naming 2025-11-28 15:39:47 +01:00
Aleksander Grygier 3470b12b76 chore: update webui build output 2025-11-28 15:09:55 +01:00
Aleksander Grygier eed1bd9b97 refactor: Enhance model info and attachment handling 2025-11-28 15:08:41 +01:00
R0CKSTAR c6f7a423c8
[MUSA] enable fp16/fast_fp16/bf16_mma on PH1 (#17551)
* [MUSA] enable fp16/fast_fp16/bf16_mma on PH1

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* Update ggml/src/ggml-cuda/fattn-vec.cuh

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* Update ggml/src/ggml-cuda/fattn-vec.cuh

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* Update ggml/src/ggml-cuda/fattn-tile.cuh

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* Address review comments

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

---------

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2025-11-28 14:08:29 +01:00
Aman Gupta 2e7ef98f18
ggml-cuda: add stricter checking for fusion (#17568)
* ggml-cuda: make conditions for fusion more explicit

* ggml-cuda: remove size check as std::equal already does it
2025-11-28 20:34:51 +08:00