llama.cpp

Commit Graph

Author	SHA1	Message	Date
Xuan Son Nguyen	1493ee09ea	tmp webui build	2025-11-26 17:43:27 +01:00
Xuan Son Nguyen	becc602612	Merge branch 'master' into xsn/server_model_management_v1_2	2025-11-26 16:21:57 +01:00
Xuan Son Nguyen	e2731c3767	set hf_repo/docker_repo as model alias when posible	2025-11-26 15:57:20 +01:00
Xuan Son Nguyen	e40f35fb61	remove support for extra args	2025-11-26 15:43:27 +01:00
xctan	6ab4e50d9c	ggml-cpu : add RISC-V Zvfh impl for ggml_vec_mad_f16 (#17448 ) * ggml-cpu : add RISC-V Zvfh impl for ggml_vec_mad_f16 * ggml-cpu : dedup scalar impl * Update ggml/src/ggml-cpu/vec.h --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-11-26 15:33:05 +02:00
Adrien Gallouët	2336cc4784	cmake : use EXCLUDE_FROM_ALL to avoid patch-boringssl.cmake (#17520 ) We have to separate the code path starting 3.28 because `FetchContent_Populate` is now deprecated and will be completely removed in a future version. Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2025-11-26 15:15:21 +02:00
Adrien Gallouët	e6923caaec	ggml : fix ARM feature verification (#17519 ) On arm64 with `cmake` version 3.31.6, the final feature verification fails: -- ARM detected flags: -mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+nossbs -- Performing Test GGML_MACHINE_SUPPORTS_dotprod -- Performing Test GGML_MACHINE_SUPPORTS_dotprod - Success -- Performing Test GGML_MACHINE_SUPPORTS_i8mm -- Performing Test GGML_MACHINE_SUPPORTS_i8mm - Success -- Performing Test GGML_MACHINE_SUPPORTS_sve -- Performing Test GGML_MACHINE_SUPPORTS_sve - Success -- Performing Test GGML_MACHINE_SUPPORTS_sme -- Performing Test GGML_MACHINE_SUPPORTS_sme - Failed -- Performing Test GGML_MACHINE_SUPPORTS_nosme -- Performing Test GGML_MACHINE_SUPPORTS_nosme - Success -- Checking for ARM features using flags: -- -U__ARM_FEATURE_SME -- -mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+nossbs+dotprod+i8mm+sve+nosme -- Performing Test HAVE_DOTPROD -- Performing Test HAVE_DOTPROD - Failed -- Performing Test HAVE_SVE -- Performing Test HAVE_SVE - Failed -- Performing Test HAVE_MATMUL_INT8 -- Performing Test HAVE_MATMUL_INT8 - Failed -- Performing Test HAVE_FMA -- Performing Test HAVE_FMA - Success -- Performing Test HAVE_FP16_VECTOR_ARITHMETIC -- Performing Test HAVE_FP16_VECTOR_ARITHMETIC - Failed -- Performing Test HAVE_SME -- Performing Test HAVE_SME - Failed -- Adding CPU backend variant ggml-cpu: -U__ARM_FEATURE_SME;-mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+nossbs+dotprod+i8mm+sve+nosme We need to explicitly replace `;` with spaces from the list to make `CMAKE_REQUIRED_FLAGS` work correctly... Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2025-11-26 15:14:41 +02:00
Jiacheng (Jason) Chen	3e18dba9fd	HIP: Patch failed testcase in WMMA-MMQ kernels for RDNA 4 (#17502 ) * patch failed test case MUL_MAT(type_a=q4_0,type_b=f32,m=576,n=512,k=576,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) for enabling WMMA on RDNA4 * Quick clean up on mma.cuh to add ggml_cuda_memcpy_1 back in for half2 and bfloat162	2025-11-26 11:18:48 +01:00
hipudding	eeb5605de2	CANN: Add MROPE and IMROPE support (#17401 ) * CANN: ROPE supports both MROPE and IMROPE. 1. Optimize the caching logic of rope_cache_init. 2. Add support for mRoPE and i-mRoPE. Note that on Ascend 910B devices, it is necessary to disable FA in CLIP and disable NZ-format conversion. These two issues are still under investigation. * Resolve review comments	2025-11-26 16:44:19 +08:00
o7si	f3a848a3b1	chore: upgrade cpp-httplib from v0.27.0 to v0.28.0 (#17513 )	2025-11-26 09:21:06 +02:00
Jeff Bolz	b3b03a7baf	vulkan: Implement GGML_OP_CUMSUM (#17479 )	2025-11-26 07:08:10 +01:00
Georgi Gerganov	583cb83416	ggml : add ggml_top_k (#17365 ) * ggml : add ggml_top_k * cont : add ggml_argsort_top_k * metal : add top_k support * ggml : cleanup * tests : add virtual err() function for test_case * ggml : add comments	2025-11-25 15:31:43 +02:00
Aleksei Nikiforov	05872ac885	convert : fix big-endian conversion (#17431 ) * Fix convert_hf_to_gguf.py script on s390x Assume converted model data is originally little-endian. Byteswap data on s390x after reading it to put values in correct presentation for any transformation needed, like calculating weight tensors. Then byteswap data to little-endian before passing it to GGUFWriter while GGUFWriter will byteswap data back to big endian if big endian output is requested. byteswap(inplace=True) calls don't work with lazy tensor and array wrappers. Use byteswap with copying data to workaround this behaviour. * Make GGUFWriter accept tensors in native endianness instead of little-endian With this change if no byteswapping is actually needed, 2 excessive byteswaps can be omitted on s390x * Fix byteswapping in convert_hf_to_gguf.py for remote models	2025-11-25 14:18:16 +01:00
Diego Devesa	55ab25caf5	codeowners : remove slaren (#17492 )	2025-11-25 13:00:23 +01:00
TianHao324	064c90d843	CANN: supports out_prod operator for F32 and F16 (#17406 ) Co-authored-by: tianhao <tianhao42@huawei.com>	2025-11-25 17:39:06 +08:00
Pascal	b1846f1c8e	webui: add rehype plugin to restore HTML in Markdown table cells (#17477 ) * webui: add rehype plugin to restore HTML in Markdown table cells The remark/rehype pipeline neutralizes inline HTML as literal text (remarkLiteralHtml) so that XML/HTML snippets in LLM responses display as-is instead of being rendered. This causes <br> and <ul> markup in table cells to show as plain text. This plugin traverses the HAST post-conversion, parses whitelisted HTML patterns (<br>, <ul><li>) from text nodes, and replaces them with actual HAST element nodes. For lists, adjacent siblings must be combined first as the AST fragmentation breaks pattern matching. Strict validation rejects malformed markup, keeping it as raw text. * chore: update webui build output	2025-11-25 08:01:02 +01:00
Jeff Bolz	d414db02d3	vulkan: Use fewer rows for scalar FA when HS is not a multiple of 16 (#17455 )	2025-11-25 07:11:27 +01:00
Aaron Teo	877566d512	llama: introduce support for model-embedded sampling parameters (#17120 )	2025-11-25 09:56:07 +08:00
Jeff Bolz	3d07caa99b	vulkan: more FA details in vk_perf_logger (#17443 )	2025-11-24 22:25:24 +01:00
Daniel Bevenius	134e6940ca	llama : skip output reordering for single token batches (#17466 ) This commit adds a check to skip the output reordering logic when n_outputs == 1. With a single output token, the data is trivially sorted and the reordering code is currently doing unnecessary work (resetting and rebuilding output_ids to the same values). The motivation for this change is improved code clarity and avoiding confusion when debugging. While the performance impact is probably negligible, this unnecessary work happens on every decode call in llama-server when processing batches with single-token outputs.	2025-11-24 21:06:17 +01:00
Jiacheng (Jason) Chen	0543f928a3	HIP: WMMA-MMQ kernels for RDNA 4 (#17156 ) * first commit naive test to enable mmq for RDNA4 * adding appropriate WMMA instructions * git rebase on top of master: fixing the correctness of the mat mul operations, updating layout mappings for RDNA4 * clean up merge conflicts * add comments and code clean up * PR clean up, addressed comments * enable MMQ fallback on RDNA4 * addressed comments: add guards in load generic, separate wmma branch for use_mmq function * Revert build-xcframework.sh * Formating: remove trailing whitespace * revert CMake files * clean up after rebase: remove duplicated change, revert cmake files * clean up after rebase: revert changes from build-xcframework.sh * clean up: remove extra space line in mma.cuh * Revert "clean up: remove extra space line in mma.cuh" This reverts commit `b39ed57c45`.	2025-11-24 20:00:10 +01:00
Sigbjørn Skjæret	b61de2b2df	convert : allow quantizing lora again (#17453 )	2025-11-24 15:50:55 +01:00
Xuan Son Nguyen	e514b86d2b	fix merge	2025-11-24 14:50:42 +01:00
Xuan Son Nguyen	399b39f21b	Merge branch 'master' into xsn/server_model_management_v1_2	2025-11-24 14:45:57 +01:00
Xuan-Son Nguyen	b8372eecd9	server: split server.cpp code into server/common/task/queue (#17362 ) * add server-task, server-common * add server-queue * rm redundant includes * move enum stop_type to server-task * server : headers cleanup --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-11-24 14:41:53 +01:00
Daniel Bevenius	6ab8eacddf	examples : add -kvu to batched usage example [no ci] (#17469 ) This commit adds the --kv-unified flag to the usage example in the README.md file for the batched example. The motivation for this is that without this flag the example will fail with the following error: ```console Hello my name is split_equal: sequential split is not supported when there are coupled sequences in the input batch (you may need to use the -kvu flag) decode: failed to find a memory slot for batch of size 4 main: llama_decode() failed ```	2025-11-24 15:38:45 +02:00
Georgi Gerganov	2d50b9d8cb	sync : ggml	2025-11-24 15:26:31 +02:00
Daniel Bevenius	697edfeead	ggml : remove dirty flag from version string (ggml/1391) This commit removes the "-dirty" suffix from the GGML version string. The motivation for this change is to ensure that the version string works with different ways of checking out ggml and using it in projects. By removing the dirty flag from the version string, we avoid potential artifacts like shared libraries getting a -dirty suffix in their names. Instead, if the project is built from a dirty git state, the dirty flag will be appended to the commit hash in the GGML_BUILD_COMMIT variable. This will enable users to still identify that the build was made from from a modified/dirty state even though the version might match a "real" version. For example, the commit can be produces as follows: ```c++ printf("commit: %s\n", ggml_commit()); ``` Which would print the following for a dirty build: ```console commit: 781baf2a-dirty ``` Refs: https://github.com/ggml-org/ggml/pull/1363#issuecomment-3569691546	2025-11-24 15:26:31 +02:00
Xuan Son Nguyen	539cbf003e	add stdin_file	2025-11-24 14:21:21 +01:00
Xuan Son Nguyen	2c6b58f785	nits	2025-11-24 12:20:34 +01:00
Alberto Cabrera Pérez	dbb852b549	ggml-cpu: arm64: q4_K repack gemm and gemv implementations (i8mm) (#16739 ) * Enabled q4_K_8x8_q8_K path on ARM * wip: I8mm qs multiplication, pending bias * cpu : arm : REPACK gemm q4_K8x8 implementation Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai> * Guard gemm with proper features, improved superblock scale and min calc Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai> * cpu: arm: Implemented REPACK gemv for Q4_K Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai> * Removed completed TODO * Fixed missing guards when selecting optimal repack type for Q4_K Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai> * Fixed macro guard for gemv * Fixed wrong comment in GEMV * Fixed warning for unused variable * vdotq_s32 -> ggml_vdotq_s32 Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai> * Clang-format issues * Apply suggestions from code review Co-authored-by: Diego Devesa <slarengh@gmail.com> * Removed unnecessary GGML_UNUSED * Fixed guards in q4_k gemm and gemv (repack) --------- Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai> Co-authored-by: Diego Devesa <slarengh@gmail.com>	2025-11-24 13:08:11 +02:00
ixgbe	5f55c385cb	ggml: add RISC-V cpu-feats (#17461 ) * ggml: add RISC-V cpu-feats Signed-off-by: Wang Yang <yangwang@iscas.ac.cn> * fix comment[1] --------- Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>	2025-11-24 13:07:14 +02:00
Xuan Son Nguyen	6ed192b4dd	add --models-allow-extra-args for security	2025-11-24 12:01:16 +01:00
william pan	4902eebe33	models : Added support for RND1 Diffusion Language Model (#17433 ) * Converted RND1 model to GGUF weights * RND1 llama.cpp support v1 * RND1 llama.cpp support v2 non causal bug * RND1 llama.cpp support v3 doccumentation * RND1 llama.cpp support v4 clean code * linting issues * RND1 pr fixes v1 * RND1 pr fixes v2 Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Diffusion documentation edits --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-11-24 14:16:56 +08:00
Max Krasnyansky	923ae3c619	hexagon: add support for ROPE_NEOX (#17458 )	2025-11-23 18:55:56 -08:00
Raul Torres	01ad35e6d6	CANN: Define `cann_graph_update_required` before macro (#17434 ) Description of the problem `cann_graph_update_required` is redundantly defined and initialized as `false` inside two mutually exclusive macro branches. Proposed solution Define it right before the macro so that it could serve both branches.	2025-11-24 10:02:52 +08:00
Aleksander Grygier	5ef3f990b9	chore: update webui build output	2025-11-24 02:24:27 +01:00
Aleksander Grygier	b2590a7f6c	refactor: Cleanup	2025-11-24 02:24:10 +01:00
M. Mediouni	fcb013847c	ggml-hexagon: Initial Hexagon v68/v69 support (#17394 ) * ggml-hexagon: fix build error with GCC Add stdexcept include to fix GCC build errors Signed-off-by: Mohamed Mediouni <mohamed@unpredictable.fr> * ggml-hexagon: check VTCM acquire failures Signed-off-by: Mohamed Mediouni <mohamed@unpredictable.fr> * ggml-hexagon: disable destination bypass on older than v73 v68 errors out if having bypass enabled when the VTCM is the destination. At least on v68 this made things actually work... not a proper fix though, so to look at later... Signed-off-by: Mohamed Mediouni <mohamed@unpredictable.fr> * ggml-hexagon: add initial v68/v69 support v68 is the Hexagon revision notably used on the Snapdragon 8cx Gen 3 and the QCM6490. Also add support for v69. 8MB isn't a supported page size, so relax asked for page size constraint for HAP_compute_res_attr_set_vtcm_param_v2 to optimal. Signed-off-by: Mohamed Mediouni <mohamed@unpredictable.fr> --------- Signed-off-by: Mohamed Mediouni <mohamed@unpredictable.fr>	2025-11-23 16:54:49 -08:00
Aleksander Grygier	13fe8607c5	refactor: Cleanup	2025-11-24 01:42:42 +01:00
Aleksander Grygier	76557cd5d3	Merge remote-tracking branch 'ngxson/xsn/server_model_management_v1_2' into allozaur/server_model_management_v1_2	2025-11-24 00:36:00 +01:00
Aleksander Grygier	e808f2b2e6	chore: update webui build output	2025-11-23 23:45:08 +01:00
Aleksander Grygier	16747dee5b	refactor: UI badges	2025-11-23 23:44:14 +01:00
Aleksander Grygier	188d3236e4	chore: update webui build output	2025-11-23 23:28:49 +01:00
Aleksander Grygier	39fb1c2b17	refactor: Cleanup	2025-11-23 23:28:28 +01:00
nullname	d5bc1ad110	ggml-hexagon: add `hex_supported_buffer` for better buffer supported check (#17212 ) * hexagon: add buffer support checks for hexagon sessions * refactor: simplify buffer support checks in hexagon operations * hexagon: update buffer support checks to use tensor structure * refactor: streamline buffer initialization for DSP queue in hexagon operations * refactor: simplify buffer initialization in DSP queue for hexagon operations * refactor: optimize hex_supported_buffer function by fold expression * wip * refactor: simplify dspqueue_buffers_init function and its usage in hexagon operations * fix: improve nan handling at hvx_vec_fast_sigmoid_fp32_guard * refactor: optimize hvx_vec_inverse_fp32_guard for better nan handling * refactor: update hvx_vec_fast_sigmoid_fp32_guard to use adjusted exponent limits * refactor: modify hvx_vec_fast_sigmoid_fp32_guard to accept parameters for improved flexibility * refactor: update hvx_vec_exp_fp32_guard to accept max_exp and inf parameters to save some instructions * refactor: move hvx_vec_inverse_fp32_guard implementation to hvx-inverse.c for better perf	2025-11-23 14:26:36 -08:00
Aleksander Grygier	fb5445e9ce	chore: update webui build output	2025-11-23 23:25:05 +01:00
Aleksander Grygier	e92ce07916	refactor: Copy To Clipboard Icon component	2025-11-23 23:23:38 +01:00
Aleksander Grygier	219fd19eb8	chore: update webui build output	2025-11-23 23:09:09 +01:00
Aleksander Grygier	41764b8fa0	refactor: Formatters	2025-11-23 22:54:14 +01:00

1 2 3 4 5 ...

7248 Commits All Branches Search

7248 Commits

All Branches