llama.cpp

Commit Graph

Author	SHA1	Message	Date
Georgi Gerganov	080b161995	completion : fix prompt cache for recurrent models (#19045 )	2026-01-25 09:12:50 +02:00
Molly Sophia	1243f93a2d	readme: update RWKV7 model links (#19061 ) Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2026-01-25 09:11:19 +02:00
Jakkala Mahesh	24bc238303	llama: fix integer type consistency in split helpers (#18894 ) * llama: fix integer type consistency in split helpers * llama: apply minor style fixes * llama: remove trailing whitespace	2026-01-25 09:10:52 +02:00
Daniel Bevenius	16639ba217	common : use two decimal places for float arg help messages (#19048 ) * common : use two decimal places for float arg help messages This commit updates the help messages for various command-line arguments in arg.cpp to display floating-point default values with two decimal places instead of one. The motivation for this changes is that currently only having one decimal place means that values generated using --help or llama-gen-docs will not display the correct values. For example, currently the value of top-p in tools/server/README.md is `0.9`, but the default value is actually '0.95'. And running llama-gen-docs does not update this value as it uses the output from the help message, which shows only one decimal place, so the values look like they are unchanged. * docs : run llama-gen-docs to update docs	2026-01-25 07:31:42 +01:00
Bartowski	9981c30130	convert : fix conversion for inheriting models that were bypassing modify_tensors (#19064 ) * Add undo_permute = False where needed * Replace super().modify_tensors with ModelBase * Add one more ModelBase.modify_tensors * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-01-25 02:36:47 +01:00
Aleksander Grygier	97642211a9	chore: update webui build output	2026-01-25 02:10:25 +01:00
Aleksander Grygier	fc377123b7	refactor: Simplify MCP errors	2026-01-25 02:09:12 +01:00
Aleksander Grygier	202262c2dc	chore: update webui build output	2026-01-25 01:44:14 +01:00
Aleksander Grygier	b58b823b57	refactor: Types	2026-01-25 01:39:49 +01:00
Aleksander Grygier	ba39f8cc7b	chore: update webui build output	2026-01-25 01:21:34 +01:00
Aleksander Grygier	9bcfdc3483	refactor: DRY	2026-01-25 01:17:59 +01:00
Aleksander Grygier	e7ff091881	chore: Add deprecation comment	2026-01-25 01:05:28 +01:00
Aleksander Grygier	1c843b2863	chore: update webui build output	2026-01-25 01:04:34 +01:00
Aleksander Grygier	5dfc520d67	refactor: Cleanup	2026-01-25 00:48:21 +01:00
Aleksander Grygier	6daa39994c	refactor: Naming & Enums	2026-01-25 00:32:37 +01:00
Aleksander Grygier	2562dc50bd	chore: update webui build output	2026-01-25 00:32:16 +01:00
Aleksander Grygier	372202632e	refactor: Cleanup	2026-01-25 00:31:49 +01:00
Aleksander Grygier	ba230c5cce	refactor: Naming + remove redundant component	2026-01-24 23:58:17 +01:00
Aleksander Grygier	f7b5f62586	refactor: Remove unused code	2026-01-24 23:45:06 +01:00
Aleksander Grygier	22d9e645aa	chore: update webui build output	2026-01-24 23:39:04 +01:00
Aleksander Grygier	d938994395	refactor: Cleanup	2026-01-24 23:38:37 +01:00
Johannes Gäßler	e9fd8dcab4	llama-fit-params: keep explicit --ctx-size 0 (#19070 )	2026-01-24 22:13:08 +01:00
Johannes Gäßler	4e5b83b226	GGUF: check that tensor size is representable (#19072 )	2026-01-24 21:57:51 +01:00
Aleksander Grygier	fc4c392dce	chore: update webui build output	2026-01-24 20:54:24 +01:00
Aleksander Grygier	79e606eb99	refactor: Constants	2026-01-24 20:52:19 +01:00
Aleksander Grygier	3d7426cdd4	refactor: Cleanup	2026-01-24 20:47:32 +01:00
Aleksander Grygier	8bf2d38da1	chore: update webui build output	2026-01-24 20:32:53 +01:00
Aleksander Grygier	14911e51fc	feat: MCP Prompts implementation improvements	2026-01-24 20:30:52 +01:00
Aleksander Grygier	801ef93522	refactor: Message Height CSS Variable	2026-01-24 19:15:38 +01:00
Aleksander Grygier	13f756421c	refactor: Enums	2026-01-24 18:37:43 +01:00
Pascal	85b8da45f9	fix: resolve TypeScript error in tool response content	2026-01-24 18:04:01 +01:00
Xuan-Son Nguyen	bb02f74c61	chat: fix language input for translategemma (#19052 ) * chat: fix language input for translategemma * Update common/chat.cpp Co-authored-by: Aldehir Rojas <hello@alde.dev> --------- Co-authored-by: Aldehir Rojas <hello@alde.dev>	2026-01-24 17:58:45 +01:00
Pascal	9ddc54b668	webui: enable vision in agentic tool responses - Include images from all message roles (not just user) - Add multipart content support for tool responses - Images from MCP tools now accessible in same agentic turn	2026-01-24 17:58:20 +01:00
Aleksander Grygier	172e93d494	Merge remote-tracking branch 'ggml-org/master' into allozaur/mcp-mvp	2026-01-24 15:13:58 +01:00
Aleksander Grygier	da9c245838	chore: update webui build output	2026-01-24 13:59:52 +01:00
Aleksander Grygier	7c4bedda87	feat: Improve formatting performance time	2026-01-24 13:58:23 +01:00
Aleksander Grygier	c39c6ef436	fix: System prompt sorting	2026-01-24 13:44:41 +01:00
Aleksander Grygier	2601bf0f59	fix: Save draft message in Chat Form when adding System Prompt from new chat view	2026-01-24 13:32:49 +01:00
Aleksander Grygier	a647edfc0b	fix: Chat Form submission	2026-01-24 12:33:24 +01:00
Johannes Gäßler	8f91ca54ec	CUDA: re-use MLA K data for V in MMA FA (#19057 )	2026-01-24 10:09:36 +01:00
Aman Gupta	81ab64f3c8	ggml-cuda: enable cuda-graphs for `n-cpu-moe` (#18934 ) * ggml-cuda: add split-wise cuda graph * add n-cpu-moe compare_llama_bench.py * fix hip/musa builds	2026-01-24 14:25:20 +08:00
nullname	8af1f5f430	ggml-hexagon: flash-attn opt (#19025 ) * optimize flash attention kernel by improving score computation and online softmax update * wip * Refactor online softmax update in flash attention kernel for improved performance * Optimize flash attention kernel by replacing float array with HVX_Vector for score computation * wip	2026-01-23 22:02:07 -08:00
Aleksander Grygier	bd16b6145c	chore: update webui build output	2026-01-24 01:32:36 +01:00
Aleksander Grygier	8428741034	feat: MCP Prompts WIP	2026-01-24 01:26:17 +01:00
Georgi Gerganov	557515be1e	graph : utilize `ggml_build_forward_select()` to avoid reallocations (#18898 ) * graph : avoid branches between embedding and token inputs * models : make deepstack graphs (e.g. Qwen3 VL) have constant topology * ci : enable -DGGML_SCHED_NO_REALLOC=ON for server CI * cont : pad token embeddings to n_embd_inp	2026-01-23 18:22:34 +02:00
Aleksander Grygier	3d88d0b6b2	chore: update webui build output	2026-01-23 15:21:56 +01:00
Aleksander Grygier	9c391d8e0d	feat: UI improvements	2026-01-23 15:21:03 +01:00
Neo Zhang	cb6caca191	[SYCL] use malloc to support both iGPU and dGPU in same time (#18992 ) * use malloc to support both iGPU and dGPU in same time * support windows --------- Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>	2026-01-23 20:54:10 +08:00
Xuan-Son Nguyen	b5b8fa1c8b	chat : fix translategemma crash on common_chat_format_example (#19019 )	2026-01-23 12:03:42 +01:00
Daniel Bevenius	a14b960bc7	model-conversion : use BUILD_DIR variable in all scripts (#19015 ) This commit modifies all the utility scripts to use an optional BUILD_DIR variable/argument to specify the build directory. The motivation for this is that Commit `3d55846a5c` ("model-conversion : add BUILD_DIR variable to run-converted-model scripts") introduced this variable to the causal and embeddings scripts, but I missed the scripts in the utils directory.	2026-01-23 09:01:36 +01:00

... 2 3 4 5 6 ...

8137 Commits All Branches Search

8137 Commits

All Branches