Commit Graph

7284 Commits

Author SHA1 Message Date
Aleksander Grygier bc577266b9 docs: Architecture documentation 2025-11-27 22:04:20 +01:00
Aleksander Grygier db479523ec feat: Condition available models based on modality + better model loading strategy & UX 2025-11-27 19:13:05 +01:00
Aleksander Grygier 9086bc30bd feat: Improve statistic badges 2025-11-27 14:12:21 +01:00
Aleksander Grygier d73353732f refactor: Architecture cleanup 2025-11-27 14:03:25 +01:00
Aleksander Grygier 78ead49830 Merge remote-tracking branch 'ngxson/xsn/server_model_management_v1_2' into allozaur/server_model_management_v1_2 2025-11-27 13:48:21 +01:00
Aleksander Grygier 6a3d6e79d2 refactor: Services/Stores syntax + logic improvements
Refactors components to access stores directly instead of using exported getter functions.

This change centralizes store access and logic, simplifying component code and improving maintainability by reducing the number of exported functions and promoting direct store interaction.

Removes exported getter functions from `chat.svelte.ts`, `conversations.svelte.ts`, `models.svelte.ts` and `settings.svelte.ts`.
2025-11-27 13:44:49 +01:00
Aleksander Grygier 69065ddc56 fix: UI 2025-11-27 11:27:58 +01:00
Aleksander Grygier 6b95118abc refactor: Processing state reactivity 2025-11-27 11:11:45 +01:00
Aleksander Grygier 2a5922b1f6 chore: update webui build output 2025-11-26 17:52:40 +01:00
Aleksander Grygier 13e7988459 refactor: Model modality handling 2025-11-26 17:51:25 +01:00
Xuan Son Nguyen 1493ee09ea tmp webui build 2025-11-26 17:43:27 +01:00
Aleksander Grygier d6ee3d133a refactor: Server store 2025-11-26 17:16:41 +01:00
Aleksander Grygier 456828b365 refactor: Chat requests abort handling 2025-11-26 16:48:13 +01:00
Aleksander Grygier 42483f463d refactor: Remove ConversationsService 2025-11-26 16:45:07 +01:00
Xuan Son Nguyen becc602612 Merge branch 'master' into xsn/server_model_management_v1_2 2025-11-26 16:21:57 +01:00
Xuan Son Nguyen e2731c3767 set hf_repo/docker_repo as model alias when posible 2025-11-26 15:57:20 +01:00
Xuan Son Nguyen e40f35fb61 remove support for extra args 2025-11-26 15:43:27 +01:00
Aleksander Grygier ddf98bdf28 refactor: Improve API header management via utility functions 2025-11-26 15:36:09 +01:00
Aleksander Grygier 9431f358b8 chore: update webui build output 2025-11-26 15:07:12 +01:00
Aleksander Grygier 284557cd2f feat: Improve model loading/unloading status updates 2025-11-26 15:06:11 +01:00
xctan 6ab4e50d9c
ggml-cpu : add RISC-V Zvfh impl for ggml_vec_mad_f16 (#17448)
* ggml-cpu : add RISC-V Zvfh impl for ggml_vec_mad_f16

* ggml-cpu : dedup scalar impl

* Update ggml/src/ggml-cpu/vec.h

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-11-26 15:33:05 +02:00
Adrien Gallouët 2336cc4784
cmake : use EXCLUDE_FROM_ALL to avoid patch-boringssl.cmake (#17520)
We have to separate the code path starting 3.28 because
`FetchContent_Populate` is now deprecated and will be completely removed
in a future version.

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-11-26 15:15:21 +02:00
Adrien Gallouët e6923caaec
ggml : fix ARM feature verification (#17519)
On arm64 with `cmake` version 3.31.6, the final feature verification fails:

    -- ARM detected flags: -mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+nossbs
    -- Performing Test GGML_MACHINE_SUPPORTS_dotprod
    -- Performing Test GGML_MACHINE_SUPPORTS_dotprod - Success
    -- Performing Test GGML_MACHINE_SUPPORTS_i8mm
    -- Performing Test GGML_MACHINE_SUPPORTS_i8mm - Success
    -- Performing Test GGML_MACHINE_SUPPORTS_sve
    -- Performing Test GGML_MACHINE_SUPPORTS_sve - Success
    -- Performing Test GGML_MACHINE_SUPPORTS_sme
    -- Performing Test GGML_MACHINE_SUPPORTS_sme - Failed
    -- Performing Test GGML_MACHINE_SUPPORTS_nosme
    -- Performing Test GGML_MACHINE_SUPPORTS_nosme - Success
    -- Checking for ARM features using flags:
    --   -U__ARM_FEATURE_SME
    --   -mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+nossbs+dotprod+i8mm+sve+nosme
    -- Performing Test HAVE_DOTPROD
    -- Performing Test HAVE_DOTPROD - Failed
    -- Performing Test HAVE_SVE
    -- Performing Test HAVE_SVE - Failed
    -- Performing Test HAVE_MATMUL_INT8
    -- Performing Test HAVE_MATMUL_INT8 - Failed
    -- Performing Test HAVE_FMA
    -- Performing Test HAVE_FMA - Success
    -- Performing Test HAVE_FP16_VECTOR_ARITHMETIC
    -- Performing Test HAVE_FP16_VECTOR_ARITHMETIC - Failed
    -- Performing Test HAVE_SME
    -- Performing Test HAVE_SME - Failed
    -- Adding CPU backend variant ggml-cpu: -U__ARM_FEATURE_SME;-mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+nossbs+dotprod+i8mm+sve+nosme

We need to explicitly replace `;` with spaces from the list to make
`CMAKE_REQUIRED_FLAGS` work correctly...

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-11-26 15:14:41 +02:00
Aleksander Grygier d0d7a88d13 chore: update webui build output 2025-11-26 14:14:15 +01:00
Aleksander Grygier 23a91cd257 refactor: Icons 2025-11-26 14:13:17 +01:00
Aleksander Grygier b1cf8bb814 refactor: Improve server properties management 2025-11-26 14:05:42 +01:00
Jiacheng (Jason) Chen 3e18dba9fd
HIP: Patch failed testcase in WMMA-MMQ kernels for RDNA 4 (#17502)
* patch failed test case MUL_MAT(type_a=q4_0,type_b=f32,m=576,n=512,k=576,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) for enabling WMMA on RDNA4

* Quick clean up on mma.cuh to add ggml_cuda_memcpy_1 back in for half2 and bfloat162
2025-11-26 11:18:48 +01:00
hipudding eeb5605de2
CANN: Add MROPE and IMROPE support (#17401)
* CANN: ROPE supports both MROPE and IMROPE.

1. Optimize the caching logic of rope_cache_init.
2. Add support for mRoPE and i-mRoPE.

Note that on Ascend 910B devices, it is necessary to disable FA
in CLIP and disable NZ-format conversion. These two issues are
still under investigation.

* Resolve review comments
2025-11-26 16:44:19 +08:00
o7si f3a848a3b1
chore: upgrade cpp-httplib from v0.27.0 to v0.28.0 (#17513) 2025-11-26 09:21:06 +02:00
Jeff Bolz b3b03a7baf
vulkan: Implement GGML_OP_CUMSUM (#17479) 2025-11-26 07:08:10 +01:00
Aleksander Grygier 19e5385bd5 chore: update webui build output 2025-11-26 02:14:33 +01:00
Aleksander Grygier 2a280b6082 feat: Model management and selection features WIP 2025-11-26 02:13:31 +01:00
Aleksander Grygier 81b8e1abb4 chore: update webui build output 2025-11-26 00:44:18 +01:00
Aleksander Grygier 22507fed74 refactor: Icons 2025-11-26 00:43:49 +01:00
Aleksander Grygier 5207527e9d fix: Audio attachments 2025-11-26 00:21:36 +01:00
Aleksander Grygier c680083cce feat: Remove redundant settigns + rearrange 2025-11-26 00:08:04 +01:00
Aleksander Grygier 33356f36e4 fix: Regenerate 2025-11-26 00:03:17 +01:00
Aleksander Grygier 82975a1f2d fix: Add `untrack` inside chat processing info data logic to prevent infinite effect 2025-11-26 00:01:36 +01:00
Aleksander Grygier 013244933b chore: update webui build output 2025-11-25 17:15:48 +01:00
Aleksander Grygier b9a3129d42 feat: Switching models logic for ChatForm or when regenerating messges + modality detection logic 2025-11-25 17:13:10 +01:00
Aleksander Grygier 4c24ead8e0 chore: update webui build output 2025-11-25 15:06:32 +01:00
Aleksander Grygier 501badc9c4 refactor: Multi-model business logic WIP 2025-11-25 15:04:46 +01:00
Georgi Gerganov 583cb83416
ggml : add ggml_top_k (#17365)
* ggml : add ggml_top_k

* cont : add ggml_argsort_top_k

* metal : add top_k support

* ggml : cleanup

* tests : add virtual err() function for test_case

* ggml : add comments
2025-11-25 15:31:43 +02:00
Aleksei Nikiforov 05872ac885
convert : fix big-endian conversion (#17431)
* Fix convert_hf_to_gguf.py script on s390x

Assume converted model data is originally little-endian.
Byteswap data on s390x after reading it to put values in correct presentation
for any transformation needed, like calculating weight tensors.

Then byteswap data to little-endian before passing it to GGUFWriter while
GGUFWriter will byteswap data back to big endian if big endian output is requested.

byteswap(inplace=True) calls don't work with lazy tensor and array wrappers.
Use byteswap with copying data to workaround this behaviour.

* Make GGUFWriter accept tensors in native endianness instead of little-endian

With this change if no byteswapping is actually needed, 2 excessive byteswaps can be omitted on s390x

* Fix byteswapping in convert_hf_to_gguf.py for remote models
2025-11-25 14:18:16 +01:00
Diego Devesa 55ab25caf5
codeowners : remove slaren (#17492) 2025-11-25 13:00:23 +01:00
Aleksander Grygier f9c911d025 refactor: Remove redundant settings 2025-11-25 10:55:08 +01:00
TianHao324 064c90d843
CANN: supports out_prod operator for F32 and F16 (#17406)
Co-authored-by: tianhao <tianhao42@huawei.com>
2025-11-25 17:39:06 +08:00
Aleksander Grygier fed6c82eeb refactor: Database, Conversations & Chat services + stores architecture improvements (WIP) 2025-11-25 10:26:09 +01:00
Aleksander Grygier ccd6c27183 refactor: DatabaseStore -> DatabaseService 2025-11-25 08:08:32 +01:00
Pascal b1846f1c8e
webui: add rehype plugin to restore HTML in Markdown table cells (#17477)
* webui: add rehype plugin to restore HTML in Markdown table cells

The remark/rehype pipeline neutralizes inline HTML as literal text
(remarkLiteralHtml) so that XML/HTML snippets in LLM responses display
as-is instead of being rendered. This causes <br> and <ul> markup in
table cells to show as plain text.

This plugin traverses the HAST post-conversion, parses whitelisted HTML
patterns (<br>, <ul><li>) from text nodes, and replaces them with actual
HAST element nodes. For lists, adjacent siblings must be combined first
as the AST fragmentation breaks pattern matching.

Strict validation rejects malformed markup, keeping it as raw text.

* chore: update webui build output
2025-11-25 08:01:02 +01:00