Xuan Son Nguyen
f2dbe9c087
rm unused fn
2025-12-01 14:20:43 +01:00
Georgi Gerganov
d182544c99
server : minor
2025-12-01 13:35:40 +02:00
Xuan Son Nguyen
4a1c05c383
fix invalid ptr to shutdown_handler
2025-11-30 15:31:05 +01:00
Xuan Son Nguyen
7b28b5e16a
fix duplicated arg
2025-11-30 14:53:47 +01:00
Xuan Son Nguyen
802e77eaf4
Merge remote-tracking branch 'webui/allozaur/server_model_management_v1_2' into xsn/server_model_management_v1_2
2025-11-29 23:54:34 +01:00
Xuan Son Nguyen
23cb411317
also route anthropic endpoints
2025-11-29 23:29:06 +01:00
Aleksander Grygier
e8b9d74b3b
chore: update webui build output
2025-11-29 23:18:45 +01:00
Aleksander Grygier
acd3c58152
refactor: Remove redundant method
2025-11-29 23:18:24 +01:00
Aleksander Grygier
360a5ed62b
test: Move demo test to tests/server
2025-11-29 23:17:34 +01:00
Xuan Son Nguyen
a82dbbfb30
decouple server_models from server_routes
2025-11-29 23:00:35 +01:00
Xuan Son Nguyen
c1dfccd078
Merge branch 'master' into xsn/server_model_management_v1_2
2025-11-29 22:34:16 +01:00
Xuan-Son Nguyen
ab49f094d2
server: move server-context to its own cpp|h ( #17595 )
...
* git mv
* add server-context.h
* add server-context.h
* clean up headers
* cont : cleanup
* also expose server_response_reader (to be used by CLI)
* fix windows build
* decouple server_routes and server_http
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-11-29 22:04:44 +01:00
Aleksander Grygier
6fd720e742
Merge remote-tracking branch 'origin/allozaur/server_model_management_v1_2' into allozaur/server_model_management_v1_2
2025-11-29 21:59:33 +01:00
Aleksander Grygier
ae8a1e8137
refactor: Tests to separate location
2025-11-29 21:44:57 +01:00
Aleksander Grygier
949b5fd63e
refactor: Tooltip Provider from core layout
2025-11-29 21:41:36 +01:00
Aleksander Grygier
4f39da823f
test: Update Chat Form UI tests
2025-11-29 20:13:11 +01:00
Aleksander Grygier
33b9cc40a1
Merge branch 'master' into allozaur/server_model_management_v1_2
2025-11-29 19:40:46 +01:00
Haiyue Wang
8c32d9d96d
server: explicitly set the function name in lambda ( #17538 )
...
As [1] explained, the real debug message will be like:
"res operator(): operator() : queue result stop"
Set the name explicitly, the message is easy for debugging:
"res operator(): recv : queue result stop"
The left "operator()" is generated by 'RES_DBG() ... __func__'
[1]: https://clang.llvm.org/extra/clang-tidy/checks/bugprone/lambda-function-name.html
Signed-off-by: Haiyue Wang <haiyuewa@163.com>
2025-11-29 18:43:29 +01:00
Igor Smirnov
0874693b44
common : fix json schema with '\' in literals ( #17307 )
...
* Fix json schema with '\' in literals
* Add "literal string with escapes" test
2025-11-29 17:06:32 +01:00
Neo Zhang
7d2add51d8
sycl : support to malloc memory on device more than 4GB, update the doc and script ( #17566 )
...
Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>
2025-11-29 14:59:44 +02:00
ixgbe
f698a79c63
ggml: replace hwcap with riscv_hwprobe for RVV detection ( #17567 )
...
Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>
2025-11-29 14:56:31 +02:00
Ruben Ortlam
47a268ea50
Vulkan: MMVQ Integer Dot K-Quant and MUL_MAT_ID support ( #16900 )
...
* vulkan: split mul_mmq_funcs for mul_mat_vecq use
* add mxfp4 mmvq
* add q2_k mmvq
* add q3_k mmvq
* add q4_k and q5_k mmvq
* add q6_k mmvq
* handle 4x4 quants per mmvq thread
* enable MUL_MAT_ID mmvq support
* enable subgroup optimizations for mul_mat_vec_id shaders
* device tuning
* request prealloc_y sync after quantization
* fix indentation
* fix llvmpipe test failures
* fix mul_mat_id mmvq condition
* fix unused variable warning
2025-11-29 09:37:22 +01:00
Jeff Bolz
59d8d4e963
vulkan: improve topk perf for large k, fix overflow in unit tests ( #17582 )
2025-11-29 08:39:57 +01:00
Aleksander Grygier
a568e74c20
chore: update webui build output
2025-11-29 02:40:09 +01:00
Aleksander Grygier
2d556bb93c
test: Fix Storybook mocks
2025-11-29 02:36:41 +01:00
Aleksander Grygier
493ef08723
refactor: Utils imports + move types to `app.d.ts`
2025-11-29 02:33:37 +01:00
Aleksander Grygier
ce9c9afe0d
chore: update webui build output
2025-11-29 01:40:00 +01:00
Aleksander Grygier
2464e06028
feat: Improve UI sidebar background color
2025-11-29 01:39:40 +01:00
Aleksander Grygier
27b152267f
refactor: Constants
2025-11-29 01:38:02 +01:00
Aleksander Grygier
648d2deebc
feat: Attachment logic & UI improvements
2025-11-29 01:36:05 +01:00
Aleksander Grygier
d49d97c642
refactor: Cleanup
2025-11-29 00:51:18 +01:00
Aleksander Grygier
f50ce7b5b4
refactor: Cleanup
2025-11-29 00:50:16 +01:00
Aleksander Grygier
4d16459b4c
re
2025-11-29 00:49:46 +01:00
Aleksander Grygier
c76de5e0ad
refactor: Cleanup
2025-11-29 00:49:20 +01:00
Aleksander Grygier
2f97dbfa65
docs: Add info comment
2025-11-29 00:49:03 +01:00
Aleksei Nikiforov
d82b7a7c1d
gguf-py : fix passing non-native endian tensors (editor-gui and new-metadata) ( #17553 )
...
gguf_new_metadata.py reads data from reader.
Reader doesn't byteswap tensors to native endianness.
But writer does expect tensors in native endianness to convert them
into requested endianness.
There are two ways to fix this: update reader and do conversion to native endianness and back,
or skip converting endianness in writer in this particular USE-case.
gguf_editor_gui.py doesn't allow editing or viewing tensor data.
Let's go with skipping excessive byteswapping.
If eventually capability to view or edit tensor data is added,
tensor data should be instead byteswapped when reading it.
2025-11-28 20:53:01 +01:00
Aleksander Grygier
1adf173dd6
refactor: Cleanup
2025-11-28 19:36:03 +01:00
Aleksander Grygier
dd30810d0a
fix: Modality detection improvement for text-based PDF attachments
2025-11-28 19:30:32 +01:00
DAN™
03914c7ef8
common : move all common_chat_parse_* to chat-parser.cpp. ( #17481 )
2025-11-28 19:29:36 +01:00
o7si
3ce7a65c2f
server: fix: /metrics endpoint returning JSON-escaped Prometheus format ( #17386 )
...
* fix: /metrics endpoint returning JSON-escaped Prometheus format
* mod: remove string overload from ok() method
2025-11-28 19:14:00 +01:00
Diego Devesa
e072b2052e
ggml : add GGML_SCHED_NO_REALLOC option to disable reallocations in ggml_backend_sched ( #17276 )
...
* ggml : add GGML_SCHED_NO_REALLOC option to disable reallocations in ggml_backend_sched
Enabled in ggml-ci for testing.
* llama : update worst-case graph for unified cache
* ci : disable op offload in some tests
* fix spelling
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-11-28 17:33:23 +02:00
Aleksander Grygier
171a0926a1
chore: update webui build output
2025-11-28 16:00:44 +01:00
Aleksander Grygier
68b653ef45
refactor: DRY `getAttachmentDisplayItems` function + fix UI
2025-11-28 15:58:52 +01:00
Aleksander Grygier
1cf5daa8c0
refactor: Cleanup
2025-11-28 15:56:41 +01:00
Aleksander Grygier
04ef4a06e2
chore: update webui build output
2025-11-28 15:44:43 +01:00
Aleksander Grygier
5fadd0fe18
refactor: Components naming
2025-11-28 15:39:47 +01:00
Aleksander Grygier
3470b12b76
chore: update webui build output
2025-11-28 15:09:55 +01:00
Aleksander Grygier
eed1bd9b97
refactor: Enhance model info and attachment handling
2025-11-28 15:08:41 +01:00
R0CKSTAR
c6f7a423c8
[MUSA] enable fp16/fast_fp16/bf16_mma on PH1 ( #17551 )
...
* [MUSA] enable fp16/fast_fp16/bf16_mma on PH1
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* Update ggml/src/ggml-cuda/fattn-vec.cuh
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* Update ggml/src/ggml-cuda/fattn-vec.cuh
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* Update ggml/src/ggml-cuda/fattn-tile.cuh
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* Address review comments
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2025-11-28 14:08:29 +01:00
Aman Gupta
2e7ef98f18
ggml-cuda: add stricter checking for fusion ( #17568 )
...
* ggml-cuda: make conditions for fusion more explicit
* ggml-cuda: remove size check as std::equal already does it
2025-11-28 20:34:51 +08:00