llama.cpp

Commit Graph

Author	SHA1	Message	Date
Aleksander Grygier	82f26ad8e4	refactor: Cleanup	2026-01-26 15:33:27 +01:00
Aleksander Grygier	5bf1c86635	refactor: Cleanup refactor: Cleanup refactor: Cleanup refactor: Cleanup	2026-01-26 15:28:50 +01:00
Sigbjørn Skjæret	142cbe2ac6	ci : use new 1vCPU runner for lightweight jobs (#19107 ) * use new 1vCPU runner for lightweight jobs * pyright is too heavy, look into ty some day use new pip-install input	2026-01-26 15:22:49 +01:00
Aleksander Grygier	7b127db90c	chore: update webui build output	2026-01-26 15:07:47 +01:00
Aleksander Grygier	717a868c23	feat: Mcp Server Selector	2026-01-26 15:03:05 +01:00
Aleksander Grygier	e566d6641e	fix: Scroll issues in DropdownMenuSearchable	2026-01-26 14:41:15 +01:00
Aleksander Grygier	d675f403e3	chore: update webui build output	2026-01-26 14:33:58 +01:00
Aleksander Grygier	ee0f0b277f	feat: Improve Code blocks rendering + add auto scroll + improve global scroll bar behavior	2026-01-26 14:32:40 +01:00
Aleksander Grygier	6586ae71d2	chore: update webui build output	2026-01-26 12:34:21 +01:00
Aleksander Grygier	c631e26a3f	refactor: Components imports/exports structure & documentation	2026-01-26 12:30:53 +01:00
Georgi Gerganov	56f3ebf38e	model : add correct type for GLM 4.7 Flash (#19106 )	2026-01-26 11:24:30 +02:00
Aleksander Grygier	b7d1de68c3	refactor: Cleanup	2026-01-26 09:54:44 +01:00
Aleksander Grygier	0a66568fc9	chore: update webui build output	2026-01-26 09:37:27 +01:00
Aleksander Grygier	fa0cad2e6e	refactor: Componentize Chat Form Prompt Picker	2026-01-26 09:36:13 +01:00
Aleksander Grygier	176abf3175	refactor: Utility function	2026-01-26 09:00:41 +01:00
Aleksander Grygier	5ee232d81c	refactor: Use store methods	2026-01-26 08:52:57 +01:00
Johannes Gäßler	0c21677e43	CUDA: faster FA for GQA > 1 but not power of 2 (#19092 )	2026-01-25 21:19:47 +01:00
ccbinn	0440bfd160	metal : fix recommendedMaxWorkingSetSize availability on legacy iOS/macOS (#19088 ) Co-authored-by: chenbin11 <chenbin11@kuaishou.com>	2026-01-25 20:07:19 +02:00
Sigbjørn Skjæret	0bf5636938	convert : yield Gemma3N custom_map tensors directly (#19091 )	2026-01-25 18:03:34 +01:00
Aman Gupta	bcb43163ae	ggml-cpu: Use tiled FA for prompt-processing (#19012 ) * ggml-cpu: Use tiled FA for prompt-processing the FA performance is gimped on CPU on long contexts because it essentially uses a vector kernel. This PR adds a tiled FA for PP. Perf tuning for tile sizes done on a AMD EPYC single-socket 64-c machine. * fix out of bounds for mask * skip rows where there are all masks * skip tile if mask is inf * store mask in worksize * check inf tile earlier	2026-01-25 23:25:58 +08:00
Georgi Gerganov	d9c6ce46f7	kv-cache : support V-less cache (#19067 ) * kv-cache : support V-less cache * cuda : better check for V_is_K_view * cuda : improve V_is_K_view check * graph : add comments * hparams : refactor	2026-01-25 15:48:56 +02:00
Aleksander Grygier	ff0e927be2	chore: update webui build output	2026-01-25 13:38:25 +01:00
Aleksander Grygier	ee9efae203	refactor: Enums	2026-01-25 13:37:08 +01:00
Aleksander Grygier	7f5284d597	refactor: Cleanup refactor: Cleanup refactor: Cleanup refactor: Cleanup	2026-01-25 13:13:11 +01:00
Sigbjørn Skjæret	70d860824a	convert : fix Gemma3N, GraniteMoe and Ernie4.5Moe (#19084 ) * fix Gemma3N and Ernie4.5Moe * fix GraniteMoe	2026-01-25 13:05:05 +01:00
Georgi Gerganov	080b161995	completion : fix prompt cache for recurrent models (#19045 )	2026-01-25 09:12:50 +02:00
Molly Sophia	1243f93a2d	readme: update RWKV7 model links (#19061 ) Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2026-01-25 09:11:19 +02:00
Jakkala Mahesh	24bc238303	llama: fix integer type consistency in split helpers (#18894 ) * llama: fix integer type consistency in split helpers * llama: apply minor style fixes * llama: remove trailing whitespace	2026-01-25 09:10:52 +02:00
Daniel Bevenius	16639ba217	common : use two decimal places for float arg help messages (#19048 ) * common : use two decimal places for float arg help messages This commit updates the help messages for various command-line arguments in arg.cpp to display floating-point default values with two decimal places instead of one. The motivation for this changes is that currently only having one decimal place means that values generated using --help or llama-gen-docs will not display the correct values. For example, currently the value of top-p in tools/server/README.md is `0.9`, but the default value is actually '0.95'. And running llama-gen-docs does not update this value as it uses the output from the help message, which shows only one decimal place, so the values look like they are unchanged. * docs : run llama-gen-docs to update docs	2026-01-25 07:31:42 +01:00
Bartowski	9981c30130	convert : fix conversion for inheriting models that were bypassing modify_tensors (#19064 ) * Add undo_permute = False where needed * Replace super().modify_tensors with ModelBase * Add one more ModelBase.modify_tensors * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-01-25 02:36:47 +01:00
Aleksander Grygier	97642211a9	chore: update webui build output	2026-01-25 02:10:25 +01:00
Aleksander Grygier	fc377123b7	refactor: Simplify MCP errors	2026-01-25 02:09:12 +01:00
Aleksander Grygier	202262c2dc	chore: update webui build output	2026-01-25 01:44:14 +01:00
Aleksander Grygier	b58b823b57	refactor: Types	2026-01-25 01:39:49 +01:00
Aleksander Grygier	ba39f8cc7b	chore: update webui build output	2026-01-25 01:21:34 +01:00
Aleksander Grygier	9bcfdc3483	refactor: DRY	2026-01-25 01:17:59 +01:00
Aleksander Grygier	e7ff091881	chore: Add deprecation comment	2026-01-25 01:05:28 +01:00
Aleksander Grygier	1c843b2863	chore: update webui build output	2026-01-25 01:04:34 +01:00
Aleksander Grygier	5dfc520d67	refactor: Cleanup	2026-01-25 00:48:21 +01:00
Aleksander Grygier	6daa39994c	refactor: Naming & Enums	2026-01-25 00:32:37 +01:00
Aleksander Grygier	2562dc50bd	chore: update webui build output	2026-01-25 00:32:16 +01:00
Aleksander Grygier	372202632e	refactor: Cleanup	2026-01-25 00:31:49 +01:00
Aleksander Grygier	ba230c5cce	refactor: Naming + remove redundant component	2026-01-24 23:58:17 +01:00
Aleksander Grygier	f7b5f62586	refactor: Remove unused code	2026-01-24 23:45:06 +01:00
Aleksander Grygier	22d9e645aa	chore: update webui build output	2026-01-24 23:39:04 +01:00
Aleksander Grygier	d938994395	refactor: Cleanup	2026-01-24 23:38:37 +01:00
Johannes Gäßler	e9fd8dcab4	llama-fit-params: keep explicit --ctx-size 0 (#19070 )	2026-01-24 22:13:08 +01:00
Johannes Gäßler	4e5b83b226	GGUF: check that tensor size is representable (#19072 )	2026-01-24 21:57:51 +01:00
Aleksander Grygier	fc4c392dce	chore: update webui build output	2026-01-24 20:54:24 +01:00
Aleksander Grygier	79e606eb99	refactor: Constants	2026-01-24 20:52:19 +01:00

1 2 3 4 5 ...

8112 Commits All Branches Search

8112 Commits

All Branches