Xuan Son Nguyen
74685f4194
allow reusing args if auto_load
2025-11-23 15:42:33 +01:00
Xuan Son Nguyen
f927e21ffc
support extra_args on loading model
2025-11-23 15:39:03 +01:00
Xuan Son Nguyen
7ef6312f85
add note
2025-11-23 15:08:31 +01:00
Xuan Son Nguyen
f25bfaba4d
expose args and exit_code in API
2025-11-23 14:59:04 +01:00
Xuan Son Nguyen
4af1b6cbac
Merge remote-tracking branch 'webui/allozaur/server_model_management_v1_2' into xsn/server_model_maagement_v1_2
...
Co-authored-by: Aleksander <aleksander.grygier@gmail.com>
2025-11-22 18:39:31 +01:00
Xuan Son Nguyen
d32bbfec82
ad endpoint docs
2025-11-22 18:01:48 +01:00
Xuan Son Nguyen
f2ca54b202
Merge branch 'master' into xsn/server_model_management_v1_2
2025-11-22 13:21:13 +01:00
Masato Nakasaka
3f3a4fb9c3
Revive MUL_MAT_ID to perf testing ( #17397 )
2025-11-22 10:55:43 +01:00
yulo
028f93ef98
HIP: RDNA4 tensor core support for MMF ( #17077 )
...
* mmf for rdna4
* align the padding for rdna4
* forbit mul_mat_f for rdna4
* fix as comment
* remove device kernels
* add constexpr for early return
* update based on review comment
* change based on the review comment
* pass compile error
* keep code consistency
---------
Co-authored-by: zhang hui <you@example.com>
2025-11-22 00:03:24 +01:00
lhez
8e9ddba610
opencl: refine condition for kqv mm ( #17392 )
2025-11-21 14:34:48 -08:00
Xuan Son Nguyen
457fbdac2c
fix compile
2025-11-21 23:26:32 +01:00
Xuan Son Nguyen
525e2746df
address review comments
2025-11-21 23:25:34 +01:00
Xuan Son Nguyen
b0540e8e1e
add env for args
2025-11-21 23:06:49 +01:00
Xuan Son Nguyen
7241558835
better --models-dir
2025-11-21 23:06:09 +01:00
Xuan Son Nguyen
7cd929076d
remove default model path
2025-11-21 22:33:04 +01:00
Xuan Son Nguyen
62ee883d5a
implement LRU
2025-11-21 22:22:57 +01:00
ubergarm
23bc779a6e
model : detect GigaChat3-10-A1.8B as deepseek lite ( #17420 )
...
* Detect GigaChat3-10-A1.8B as deepseek lite
Hardcodes checking number of layers to detect if lite version of deepseek.
* Add commnent identifying deepseek lite variants
deepseek lite variants include DeepSeek-V2-Lite, GigaChat3-10B-A1.8B
2025-11-21 14:51:38 +01:00
Adrien Gallouët
28175f857d
cmake : add option to build and link BoringSSL ( #17205 )
...
* cmake: add option to build and link BoringSSL
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* cmake : fix typo
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* cmake : disable boringssl test and asm by default
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* cmake : skip bssl
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* cmake : disable fips
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* cmake : fix cmake --install
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* ci : use boringssl for windows and mac
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
---------
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-11-21 11:46:45 +01:00
Adrien Gallouët
9cc4080441
ci : start using OpenSSL ( #17235 )
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-11-21 11:45:00 +01:00
Xuan Son Nguyen
032b9ff4a9
add --models-dir param
2025-11-21 11:11:01 +01:00
Jeff Bolz
f1ffbba68e
vulkan: disable async for older Intel devices ( #17369 )
...
* vulkan: disable async for older Intel devices
* update detection logic
* use name string for detection
2025-11-21 09:58:17 +01:00
Raul Torres
2370665e56
CANN: Refactor `evaluate_and_capture_cann_graph` ( #17333 )
...
* CANN: Refactor `evaluate_and_capture_cann_graph`
**Description of the problem**
* `matched_graph` is obtained even if graph mode is disabled.
* End of graph capture and graph replay are unnecessarily placed in different `if` blocks.
**Proposed solution**
* Obtain `matched_graph` only if graph mode is enabled.
* Place end of graph capture and graph reply inside the same `if` block.
* Unify graph related comments.
* Remove trailing whitespace
2025-11-21 16:23:29 +08:00
nullname
21d31e0810
ggml-hexagon: fix swiglu failure at `test-backend-ops` ( #17344 )
...
* refactor: use hvx_vec_exp_fp32_guard_inf for overflow handling in hvx_exp_f32
* feat: add fast sigmoid function with overflow guard for fp32
* refactor: replace hvx_vec_inverse_fp32 with hvx_vec_inverse_fp32_guard_inf for improved overflow handling
* feat: enhance hvx_add_scalar_f32 with overflow handling using infinity guard
* wip
* add HVX_Vector_Alias
wip
* wip
* fix: improve handling of src1 tensor in glu_swiglu_fp32_per_thread function
* fix nc
* wip
* wip
* handle nan at inverse
* wip
* fix neg
* wip
* rename
* fix hvx_vec_inverse_fp32_guard_inf to handle infinity and NaN cases correctly
* wip
* fix hvx_vec_inverse_fp32_guard_inf to handle NaN cases correctly
* wip
* wip
* wip
* fix output sign
2025-11-20 15:45:05 -08:00
Xuan Son Nguyen
a2e912cf35
address review comment
2025-11-20 21:54:22 +01:00
Xuan Son Nguyen
cd5c699304
add docs (first version)
2025-11-20 21:45:05 +01:00
Xuan Son Nguyen
be25bccdff
address review comment
2025-11-20 21:37:22 +01:00
Daniel Han
dd0f321941
readme : add Unsloth exporting to GGUF in tools ( #17411 )
2025-11-20 20:07:36 +01:00
Xuan Son Nguyen
6929c9f43d
address thread safety issue
2025-11-20 18:38:02 +01:00
Xuan-Son Nguyen
054a45c3d3
grammar: fix regression caused by #17381 ( #17412 )
...
* grammar: fix regression caused by #17381
* more readable
2025-11-20 18:35:10 +01:00
Xuan Son Nguyen
5369aaa1d6
address most problems
2025-11-20 18:34:22 +01:00
Xuan Son Nguyen
216140867e
tmp apply upstream fix
2025-11-20 18:19:21 +01:00
Xuan Son Nguyen
5805ca7960
add is_active()
2025-11-20 16:26:31 +01:00
Xuan Son Nguyen
d0ea9e0830
also allow terminate loading model
2025-11-20 16:20:14 +01:00
Xuan Son Nguyen
6610724f8e
fix unsafe pointer
2025-11-20 16:13:30 +01:00
Xuan Son Nguyen
b9ebdf616a
more stable
2025-11-20 15:49:40 +01:00
Xuan Son Nguyen
919d3f8cbf
Merge branch 'master' into xsn/server_model_management_v1_2
2025-11-20 14:19:16 +01:00
Aleksander Grygier
4c91f2633f
Improved file naming & structure for UI components ( #17405 )
...
* refactor: Component iles naming & structure
* chore: update webui build output
* refactor: Dialog titles + components namig
* chore: update webui build output
* refactor: Imports
* chore: update webui build output
2025-11-20 14:07:31 +01:00
Piotr Wilkin (ilintar)
92c0b387a9
grammar : fix integer overflow ( #17381 )
...
* Fix DoS / integer overflow
* Remove optional, use INT64_MAX instead as placeholder value (it's technically -1, so it fits :)
* White space
* Actually, since it's unsigned, use UINT64_MAX
2025-11-20 14:47:04 +02:00
Xuan Son Nguyen
7c6eb17fad
fix windows
2025-11-20 13:14:56 +01:00
Georgi Gerganov
2286a360ff
sync : ggml
2025-11-20 14:10:44 +02:00
YangLe
1d321e592b
metal : fix compile on macos 11 (whisper/3533)
2025-11-20 14:10:44 +02:00
Georgi Gerganov
196f5083ef
common : more accurate sampling timing ( #17382 )
...
* common : more accurate sampling timing
* eval-callback : minor fixes
* cont : add time_meas impl
* cont : fix log msg [no ci]
* cont : fix multiple definitions of time_meas
* llama-cli : exclude chat template init from time measurement
* cont : print percentage of unaccounted time
* cont : do not reset timings
2025-11-20 13:40:10 +02:00
o7si
5088b435d4
convert : fix TypeError when loading base model remotely in convert_lora_to_gguf ( #17385 )
...
* fix: TypeError when loading base model remotely in convert_lora_to_gguf
* refactor: simplify base model loading using cache_dir from HuggingFace
* Update convert_lora_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* feat: add remote_hf_model_id to trigger lazy mode in LoRA converter
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-11-20 12:30:12 +01:00
Piotr Wilkin (ilintar)
845f200b28
ggml : Fix transposed SOLVE_TRI result ( #17323 )
...
* Did someone transpose the SOLVE_TRI result matrix? Perhaps...
* Update ggml/src/ggml-cpu/ops.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update ggml/src/ggml-cpu/ops.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-11-20 12:58:21 +02:00
Scott Fudally
a7784a8b1d
DGX Spark: UMA support ( #17368 )
...
* DGX Spark: UMA support
* Updates from PR feedback
* More PR feedback cleanup
* Update ggml/src/ggml-cuda/ggml-cuda.cu
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Remove trailing whitespace
* Update ggml/src/ggml-cuda/ggml-cuda.cu
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-11-20 12:32:02 +02:00
Adrien Gallouët
79bb743512
ggml : remove useless and error-prone variadic macros ( #17399 )
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-11-20 11:18:27 +01:00
sudhiarm
3ae282a06f
kleidiai: fix zero-size array declaration ( #17240 )
2025-11-20 11:45:49 +02:00
ixgbe
5be353ec4a
ggml-cpu:add RISC-V RVV (Zvfh) optimization for FP16 vector scaling ( #17314 )
...
* ggml-cpu:add RISC-V RVV (Zvfh) optimization for FP16 vector scaling
Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>
* fix comment
* fix comment 2
---------
Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>
2025-11-20 08:09:18 +02:00
Xuan Son Nguyen
0ef3b61e82
add test
2025-11-20 00:29:59 +01:00
Xuan Son Nguyen
5423d42a35
use subprocess.h, better logging
2025-11-20 00:05:29 +01:00