llama.cpp

Commit Graph

Author	SHA1	Message	Date
Aleksander Grygier	22d9e645aa	chore: update webui build output	2026-01-24 23:39:04 +01:00
Aleksander Grygier	d938994395	refactor: Cleanup	2026-01-24 23:38:37 +01:00
Aleksander Grygier	fc4c392dce	chore: update webui build output	2026-01-24 20:54:24 +01:00
Aleksander Grygier	79e606eb99	refactor: Constants	2026-01-24 20:52:19 +01:00
Aleksander Grygier	3d7426cdd4	refactor: Cleanup	2026-01-24 20:47:32 +01:00
Aleksander Grygier	8bf2d38da1	chore: update webui build output	2026-01-24 20:32:53 +01:00
Aleksander Grygier	14911e51fc	feat: MCP Prompts implementation improvements	2026-01-24 20:30:52 +01:00
Aleksander Grygier	801ef93522	refactor: Message Height CSS Variable	2026-01-24 19:15:38 +01:00
Aleksander Grygier	13f756421c	refactor: Enums	2026-01-24 18:37:43 +01:00
Pascal	85b8da45f9	fix: resolve TypeScript error in tool response content	2026-01-24 18:04:01 +01:00
Pascal	9ddc54b668	webui: enable vision in agentic tool responses - Include images from all message roles (not just user) - Add multipart content support for tool responses - Images from MCP tools now accessible in same agentic turn	2026-01-24 17:58:20 +01:00
Aleksander Grygier	172e93d494	Merge remote-tracking branch 'ggml-org/master' into allozaur/mcp-mvp	2026-01-24 15:13:58 +01:00
Aleksander Grygier	da9c245838	chore: update webui build output	2026-01-24 13:59:52 +01:00
Aleksander Grygier	7c4bedda87	feat: Improve formatting performance time	2026-01-24 13:58:23 +01:00
Aleksander Grygier	c39c6ef436	fix: System prompt sorting	2026-01-24 13:44:41 +01:00
Aleksander Grygier	2601bf0f59	fix: Save draft message in Chat Form when adding System Prompt from new chat view	2026-01-24 13:32:49 +01:00
Aleksander Grygier	a647edfc0b	fix: Chat Form submission	2026-01-24 12:33:24 +01:00
Johannes Gäßler	8f91ca54ec	CUDA: re-use MLA K data for V in MMA FA (#19057 )	2026-01-24 10:09:36 +01:00
Aman Gupta	81ab64f3c8	ggml-cuda: enable cuda-graphs for `n-cpu-moe` (#18934 ) * ggml-cuda: add split-wise cuda graph * add n-cpu-moe compare_llama_bench.py * fix hip/musa builds	2026-01-24 14:25:20 +08:00
nullname	8af1f5f430	ggml-hexagon: flash-attn opt (#19025 ) * optimize flash attention kernel by improving score computation and online softmax update * wip * Refactor online softmax update in flash attention kernel for improved performance * Optimize flash attention kernel by replacing float array with HVX_Vector for score computation * wip	2026-01-23 22:02:07 -08:00
Aleksander Grygier	bd16b6145c	chore: update webui build output	2026-01-24 01:32:36 +01:00
Aleksander Grygier	8428741034	feat: MCP Prompts WIP	2026-01-24 01:26:17 +01:00
Georgi Gerganov	557515be1e	graph : utilize `ggml_build_forward_select()` to avoid reallocations (#18898 ) * graph : avoid branches between embedding and token inputs * models : make deepstack graphs (e.g. Qwen3 VL) have constant topology * ci : enable -DGGML_SCHED_NO_REALLOC=ON for server CI * cont : pad token embeddings to n_embd_inp	2026-01-23 18:22:34 +02:00
Aleksander Grygier	3d88d0b6b2	chore: update webui build output	2026-01-23 15:21:56 +01:00
Aleksander Grygier	9c391d8e0d	feat: UI improvements	2026-01-23 15:21:03 +01:00
Neo Zhang	cb6caca191	[SYCL] use malloc to support both iGPU and dGPU in same time (#18992 ) * use malloc to support both iGPU and dGPU in same time * support windows --------- Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>	2026-01-23 20:54:10 +08:00
Xuan-Son Nguyen	b5b8fa1c8b	chat : fix translategemma crash on common_chat_format_example (#19019 )	2026-01-23 12:03:42 +01:00
Daniel Bevenius	a14b960bc7	model-conversion : use BUILD_DIR variable in all scripts (#19015 ) This commit modifies all the utility scripts to use an optional BUILD_DIR variable/argument to specify the build directory. The motivation for this is that Commit `3d55846a5c` ("model-conversion : add BUILD_DIR variable to run-converted-model scripts") introduced this variable to the causal and embeddings scripts, but I missed the scripts in the utils directory.	2026-01-23 09:01:36 +01:00
Alberto Cabrera Pérez	091a46cb8d	ggml-cpu: aarm64: q5_K repack gemm and gemv (and generic) implementations (i8mm) (#18860 ) * Boilerplate for q5_Kx8 REPACK on ARM and fallback Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai> * Implements make_block_q5_Kx8 by extending make_block_q4_Kx8 Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai> * q5_K repack gemm and gemv generics * Gemm and Gemv ARM implementations (i8mm) * Improved qh manipulation looking at non-repack vec_dot implementation * Full unroll * Apply Q5_K Gemv vand and vshl optimizations to gemm. Improve comments. Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai> * Fix wrong fallback definitions of Q5_K Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai> * Fixed comments. Reverted unnecessary formatting Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai> * Fixed typo in generic definitions * Switching AND + Shift with Shift Insert. Better op interleaving. * Vectorize + unroll the block scales * Apply gemm optimizations to gemv * Improve bias calculation --------- Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai>	2026-01-23 09:55:08 +02:00
Aldehir Rojas	a3e812811d	cli : load parser definition (#19031 ) * cli : load parser definition * cont : only unload if a parser is defined	2026-01-22 20:31:22 -06:00
Xuan-Son Nguyen	51fa458a92	server : support preserving reasoning_content in assistant message (#18994 ) * support reasoning_content input * report template caps to webui * add docs * rm commented code	2026-01-22 21:30:06 +01:00
Georgi Gerganov	a5eaa1d6a3	mla : make the V tensor a view of K (#18986 ) * mla : pass V as a view of K to the FA op * cuda : adjust mla logic to new layout * kv-cache : fix rope shift * tests : remove comment * cuda : fix reusable_cutoff Co-authored-by: Johannes Gäßler <johannesg@5d6.de> --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2026-01-22 22:09:01 +02:00
Johannes Gäßler	e2baf02162	CUDA: fix alignment check for FA (#19023 )	2026-01-22 20:39:25 +01:00
Aman Gupta	e34d6d03b2	convert_hf_to_gguf.py: refactor modify_tensors to call super (#18866 )	2026-01-23 02:58:07 +08:00
lhez	9c96465f99	opencl: enable the general fp mm for non-cont input and as a fallback for specialized kqv kernel for adreno (#18970 ) * opencl: add `copy_to_contiguous` and utilize mm kernels * opencl: only copy to cont for f32 and f16 tensors * opencl: use cont mm for fallback when dst is large * opencl: use nb local to copy-to-cont * opencl: use local offset as well	2026-01-22 10:29:25 -08:00
Xuan-Son Nguyen	4e595b250a	server: do not log certain endpoints (avoid log spam) (#19028 )	2026-01-22 19:24:37 +01:00
Aleksander Grygier	963711cccb	chore: update webui build output	2026-01-22 18:20:55 +01:00
Aleksander Grygier	6018f85c65	feat: Architectural improvements	2026-01-22 18:19:37 +01:00
Aleksander Grygier	c02e83c32a	feat: Per-conversation agentic loop state	2026-01-22 17:38:51 +01:00
Georgi Gerganov	0e4ebeb057	quant : manual overrides of tensor types take precedence (#18952 )	2026-01-22 16:17:06 +02:00
Aaron Teo	8b30840703	release: update github api (#19022 )	2026-01-22 21:38:02 +08:00
Xuan-Son Nguyen	9eb5bfec1a	mtmd : update docs to use llama_model_n_embd_inp (#18999 )	2026-01-22 14:36:32 +01:00
손희준	c6926d1d95	server: Reorder methods in `server-task.cpp` (#19016 ) * Move `task_result_state::update_chat_msg` to match with header * Move `server_task_result_cmpl_partial::to_json_anthropic()` to match with header --------- Co-authored-by: openingnow <>	2026-01-22 14:36:04 +01:00
Aman Gupta	b70d251076	CUDA: add gqa_ratio 4 for GLM 4.7 flash (#18953 )	2026-01-22 18:51:53 +08:00
shaofeiqi	5516b9c16a	opencl: add TRI op support (#18979 )	2026-01-21 22:05:54 -08:00
Aleksei Nikiforov	94242a62c0	ggml-zdnn : mark zDNN buffers as non-host (#18967 ) While buffers reside in host memory, additional transformation is needed to use buffers with zDNN. Fixes #18848	2026-01-22 01:16:21 +01:00
Pádraic Slattery	6b99a223e3	ci : update GitHub Actions versions [no ci] (#18935 )	2026-01-22 00:57:18 +01:00
Mariusz Woloszyn	77078e80e5	convert : add Devstral-2 (Ministral3ForCausalLM) arch (#18972 ) * Add Ministral3ForCausalLM architeture This adds support for newer architectres like Devstral-2 * removed blank line found after function decorator Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-01-22 00:55:55 +01:00
Piotr Wilkin (ilintar)	c301172f66	jinja: support none\|string (#18995 ) * jinja: support none\|string * Update common/jinja/value.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update tests/test-jinja.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Add as_string() --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-01-21 19:24:37 +01:00
Hendrik Erz	3802d3c78f	fix: Use `tabular-nums` for chat message statistics (#18915 ) * fix: Use `tabular-nums` for chat message statistics * fix: Rebuild WebUI	2026-01-21 18:46:01 +01:00

1 2 3 4 5 ...

7961 Commits All Branches Search

7961 Commits

All Branches