llama.cpp

Commit Graph

Author	SHA1	Message	Date
Aleksander Grygier	22e1ce2f81	webui: Fix clickability around chat processing statistics UI (#17278 ) * fix: Better pointer events handling in chat processing info elements * chore: update webui build output	2025-11-15 22:41:41 +01:00
Pascal	1411d9275a	webui: add OAI-Compat Harmony tool-call streaming visualization and persistence in chat UI (#16618 ) * webui: add OAI-Compat Harmony tool-call live streaming visualization and persistence in chat UI - Purely visual and diagnostic change, no effect on model context, prompt construction, or inference behavior - Captured assistant tool call payloads during streaming and non-streaming completions, and persisted them in chat state and storage for downstream use - Exposed parsed tool call labels beneath the assistant's model info line with graceful fallback when parsing fails - Added tool call badges beneath assistant responses that expose JSON tooltips and copy their payloads when clicked, matching the existing model badge styling - Added a user-facing setting to toggle tool call visibility to the Developer settings section directly under the model selector option * webui: remove scroll listener causing unnecessary layout updates (model selector) * Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * chore: npm run format & update webui build output * chore: update webui build output --------- Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>	2025-11-15 21:09:32 +01:00
Sigbjørn Skjæret	662192e1dc	convert : remove unnecessary chat template patching (#17289 )	2025-11-15 20:58:59 +01:00
bssrdf	fa7dd684bf	not working properly for channel numbers of 32, 48, 96 etc., ok for 64, 128...	2025-11-15 14:45:01 -05:00
Jeff Bolz	24dc769f1b	vulkan: Fuse mul_mat_id+add_id+mul and mul_mat+add+add. (#17287 ) These both show up in gpt-oss. Also, cleanup the mul_mat_vec fusion code a bit.	2025-11-15 19:54:23 +01:00
bssrdf	e489dd2773	WIP	2025-11-15 09:58:23 -05:00
Ruben Ortlam	4dca015b7e	vulkan: Replace 16-bit unpack8 calls to work around legacy Windows AMD driver bug (#17285 )	2025-11-15 15:18:58 +01:00
Sigbjørn Skjæret	9a8860cf5d	convert : use all parts in safetensors index (#17286 )	2025-11-15 14:12:39 +01:00
Sigbjørn Skjæret	9d3ef4809f	convert : set expert gating func in base class (#17279 )	2025-11-15 14:06:24 +01:00
Ankur Verma	c7b7db0445	mtmd-cli: Avoid logging to stdout for model loading messages in mtmd-cli (#17277 )	2025-11-15 12:41:16 +01:00
Giuseppe Scrivano	1568d13c2c	vulkan: implement ABS and NEG (#17245 ) * docs: update Vulkan ops * vulkan: add NEG op * vulkan: add ABS op --------- Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2025-11-15 12:00:29 +01:00
Jeff Bolz	439342ea0b	vulkan: Use ggml_vk_tensor_subbuffer in mul_mat_vec(id) paths (#17244 ) * vulkan: Use ggml_vk_tensor_subbuffer in mul_mat_vec(id) paths * set allow_misalign	2025-11-15 11:56:15 +01:00
Jeff Bolz	234ae7d7bd	vulkan: skip all-negative-inf blocks in FA (#17186 )	2025-11-15 10:37:25 +01:00
Jeff Bolz	38eaf32af1	vulkan: change graph_compute to be async and enable get_tensor_async (#17158 ) * vulkan: change graph_compute to be async and enable get_tensor_async This allows some additional CPU/GPU overlap for large pp workloads. Also seems to help a bit for token gen, maybe getting rid of a small bubble between graph_compute and get_tensor. Async set and copy functions seem to be very rarely used, so I didn't enable them because I didn't have a good way to test them. The async commands need to be ordered against each other, so put them all on the compute queue. The non-async commands still use the transfer queue. The fence for graph_compute/get_tensor_async is submitted and waited on in ggml_vk_synchronize. * fix thread safety errors * teardown context cleanly * Handle async read to non-pinned dst	2025-11-15 09:06:41 +01:00
bssrdf	e10b495dd2	add the missing guard	2025-11-15 01:24:09 -05:00
bssrdf	dbeb6ced46	WIP: debugging	2025-11-15 00:18:26 -05:00
bssrdf	378bb8368e	WIP: adding cp.async calls	2025-11-14 18:48:06 -05:00
bssrdf	11bd9806bf	add/fix GGML_UNUSED	2025-11-14 17:01:24 -05:00
bssrdf	e4fbece606	various small optimizations	2025-11-14 13:51:07 -05:00
bssrdf	ecbbdb6608	reducing integer ops	2025-11-14 13:05:31 -05:00
bssrdf	b4530b4f8b	disable m16n8k16 mma for ampere for now	2025-11-14 12:11:52 -05:00
bssrdf	0cb1ff419a	move some register to const memory space	2025-11-14 12:02:13 -05:00
bssrdf	b015e4b7dc	WIP: fixed bugs now results are correct	2025-11-14 11:10:34 -05:00
Xuan-Son Nguyen	9b17d74ab7	mtmd: add mtmd_log_set (#17268 )	2025-11-14 15:56:19 +01:00
Bartowski	e1fcf8b09b	model : add AfmoeForCausalLM support (#16477 ) * Add AFMOE model support * Update to vocab * Add model sizing * Undo Rope change for ARCEE model * Address review comments * Update modeling code is_sliding -> use_rope, replace hard-coded logic * Fix AFMOE tokenizer * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update AFMoE tokenizer class identification to be more unique --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-11-14 13:54:10 +01:00
Marek Hradil jr.	6cd0cf72ce	fix : Dangling pointer for non-empty trigger words in lazy grammar construction (#17048 ) * fix : Dangling pointer for non-empty trigger words in llama_sampler_init_grammar_impl (#17047) * Replace 'static' workaround, with keeping variable in scope for longer * Create std::array directly and pass into llama_grammar_init_impl * Add back the trigger pattern * Missed array include	2025-11-14 14:35:26 +02:00
Georgi Gerganov	d396b43748	server : fix "can batch with" bug (#17263 )	2025-11-14 14:03:45 +02:00
Georgi Gerganov	45c6ef7307	metal : support argsort for ne00 > 1024 (#17247 ) * metal : refactor argsort * cont : sort chunks * cont : merge sorted buckets * cont : cleanup	2025-11-14 09:36:06 +02:00
Georgi Gerganov	2606b0adab	metal : make the FA extra sizes consistent (#17143 )	2025-11-14 09:13:34 +02:00
ixgbe	307772fcda	readme : add RVV,ZVFH,ZFH,ZICBOP support for RISC-V (#17259 ) Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>	2025-11-14 09:12:56 +02:00
bssrdf	7d99222a61	WIP: debugging	2025-11-13 22:08:41 -05:00
Aleksander Grygier	f1bad23f88	Better UX for handling multiple attachments in WebUI (#17246 )	2025-11-14 01:19:08 +01:00
bssrdf	63c53fe1f1	WIP: move rs loop into block-k-loop following cutlass	2025-11-13 18:44:32 -05:00
bssrdf	8bfb7ed2f2	restore smem pointer at teh end of evry rs loop	2025-11-13 16:32:27 -05:00
Alberto Cabrera Pérez	becc4816dd	ggml-cpu: handle 3d tensors in repack mat_mul (#17241 ) * ggml-cpu: handle 3d tensors in repack mul_mat * Removed unnecessary branch, removed need for <algorithm> * Fixed dst_ptr pointer in chunk + clang_format * GGML_ASSERT to check wdata within bounds * Accidental ggml.h inclusion * Improved GGML_ASSERT on wdata boundaries * Address performance regression in Qwen and llama.cpp due to chunking	2025-11-13 12:53:00 -08:00
bssrdf	0939511846	change mac loop to match cutlass	2025-11-13 15:45:43 -05:00
Xuan-Son Nguyen	c4abcb2457	server: fixing naming conflict res_error (#17243 )	2025-11-13 20:53:47 +01:00
Piotr Wilkin (ilintar)	389ac78b26	ggml : add ops SOFTPLUS, EXPM1, TRI, SOLVE_TRI, CUMSUM (#17063 ) * Add ops needed for new hybrid models: SOFTPLUS, EXPM1, TRI, SOLVE_TRI, CUMSUM * Update ggml/include/ggml.h Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update tests/test-backend-ops.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Code review * Whitespace * Update tests/test-backend-ops.cpp Co-authored-by: Diego Devesa <slarengh@gmail.com> * This is actually sigmoid, duh. * Add CONST, remove TRI_KEEP, other changes from review * Update tests/test-backend-ops.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml/src/ggml.c Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml/src/ggml.c Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml/src/ggml-cuda/unary.cu Co-authored-by: Aman Gupta <amangupta052@gmail.com> * Remove extra script * Update ggml/src/ggml.c Co-authored-by: Diego Devesa <slarengh@gmail.com> * Update tests/test-backend-ops.cpp Co-authored-by: Diego Devesa <slarengh@gmail.com> * moving changes from laptop [no ci] * pre-rebase * Update tests/test-backend-ops.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update tests/test-backend-ops.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Refactor tests * ggml : cleanup * cont : fix ggml_fill srcs * tests : add note * ggml : add ggml_fill_inplace * ggml : add asserts * ggml : fix ggml_fill constant cast * cont : ggml_tri minor * Use TENSOR_LOCALS * Fix regression from #14596, regenerate * Don't make commits at night... --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Diego Devesa <slarengh@gmail.com> Co-authored-by: Aman Gupta <amangupta052@gmail.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-11-13 20:54:47 +02:00
Ruben Ortlam	a19bd6f7ce	vulkan: remove shell call from vulkan-shaders-gen tool, revert file check (#17219 ) * vulkan: remove shell call from vulkan-shaders-gen tool * use string vector for command execution * Fix condition * use string, remove const_cast * Fix dependency file quotation on Windows --------- Co-authored-by: Jeff Bolz <jbolz@nvidia.com>	2025-11-13 14:51:21 +01:00
Diego Devesa	dd091e52f8	sched : fix reserve ignoring user tensor assignments (#17232 )	2025-11-13 13:14:02 +01:00
ixgbe	1215dde7b0	ggml-cpu : add RISC-V vector intrinsic support for silu and cvar operations (#17227 ) Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>	2025-11-13 13:13:32 +01:00
bagheera	0cfb19166b	metal: accelerated conv2d (#17175 ) * metal: accelerated conv2d * cont : cleanup --------- Co-authored-by: bghira <bghira@users.github.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-11-13 13:32:44 +02:00
Georgi Gerganov	2776db6c81	Revert "ggml-cpu: handle 3d tensors in repack mat_mul (#17030 )" (#17233 ) This reverts commit `1c398dc9ec`.	2025-11-13 12:59:37 +02:00
Diego Devesa	879dec341a	ggml-cpu : use template for argsort (#17222 )	2025-11-13 10:59:05 +02:00
TecJesh	97d5117217	CANN: Add cross_entropy_loss op support (#16886 ) * update L2_NORM op support * update L2_NORM op support * remove extra whitespace * cann: update cross_entropy_loss op support * remove trailing whitespaces * rebase the latest code in the main repository and remove the l2_norm operator that already exists in another pull request. * undo the l2_norm operator deletion	2025-11-13 09:39:51 +08:00
Aman Gupta	a90eb94ca9	CUDA: fuse rope + set_rows (#16884 ) * CUDA: add fused rope * move k forward_expand up * create helper function instead of re-using params * make assert statement more in line with comment * rope_norm: coalesced writes to global mem	2025-11-13 08:50:01 +08:00
Neo Zhang Jianyu	07751f8d44	update SYCL support OPs (#17208 ) Co-authored-by: Zhang Jianyu <zhang.jianyu@outlook.com>	2025-11-13 08:42:23 +08:00
o7si	ffb6f3d921	vocab : correct bounds check for UGM XCDA array access (#17215 )	2025-11-12 23:41:02 +01:00
Johannes Gäßler	5d6838b74f	CUDA: static assert to prevent misuse of memcpy_1 (#17198 )	2025-11-12 23:13:55 +01:00
Mike Abbott	92bb442ad9	docker : preserve .so symlinks for docker container builds (#17214 )	2025-11-12 20:33:55 +01:00

... 16 17 18 19 20 ...

8027 Commits All Branches Search

8027 Commits

All Branches