llama.cpp

Commit Graph

Author	SHA1	Message	Date
ddh0	6934780669	optimize	2025-12-14 16:26:15 -06:00
ddh0	36b526d768	Merge branch 'master' into power-law-sampler	2025-12-14 15:43:49 -06:00
Xuan-Son Nguyen	52392291b2	preset: handle negated arg, reverse the meaning if needed (#18041 )	2025-12-14 22:08:10 +01:00
Sigbjørn Skjæret	5c8a717128	convert : refactor rope scaling handling (#18013 ) * refactor rope scaling handling * ws-- * missed a couple * use find_hparam	2025-12-14 16:04:37 +01:00
Haowei Wu	37f5a1093b	mtmd: enhance image resizing in llava_uhd (#18014 )	2025-12-14 15:57:52 +01:00
Ruben Ortlam	9e6649ecf2	vulkan: fix mul_mat_vec_iq1_s formatting (#18026 )	2025-12-14 14:52:46 +01:00
Xuan-Son Nguyen	0759b09c90	graph: add f_attn_temp_offset (#18025 )	2025-12-14 13:05:59 +01:00
ddh0	667b70fdac	update default decay	2025-12-14 03:41:28 -06:00
ddh0	ec54fe5f14	no, but does this?	2025-12-14 02:54:14 -06:00
Georgi Gerganov	254098a279	common : refactor common_sampler + grammar logic changes (#17937 ) * common : refactor common_sampler + grammar logic changes * tests : increase max_tokens to get needed response * batched : fix uninitialized samplers	2025-12-14 10:11:13 +02:00
Jeff Bolz	3238b1400c	vulkan: Fix data race/hang in scalar/cm1 flash attention (#17887 )	2025-12-14 09:00:00 +01:00
ddh0	2a3f579d1f	does this fix it?	2025-12-14 01:55:02 -06:00
lovedheart	4722671641	vulkan: improve mul_mat_vec_iq1_s speed (#17874 )	2025-12-14 08:47:49 +01:00
Eve	d15d177f43	vulkan: faster q6_k matmul (#17813 ) * q6_k faster mul mat * 8 values * fix comment * switch to two at a time * start ci for .glsl files	2025-12-14 08:29:37 +01:00
Georgi Gerganov	77ad8542bd	model-conversion : cast logits to float32 (#18009 )	2025-12-14 08:58:13 +02:00
ddh0	9613c48172	with logging	2025-12-14 00:36:59 -06:00
Georgi Gerganov	609a2d0268	models : fix YaRN regression + consolidate logic (#18006 ) * models : fix YaRN regression + consolidate logic * cont : fix the fix * cont : remove header * cont : add header	2025-12-14 08:34:56 +02:00
Georgi Gerganov	a63cbafbbc	ggml : arm repack fix build	2025-12-14 08:33:51 +02:00
Georgi Gerganov	0e59224990	sync : ggml	2025-12-14 08:33:51 +02:00
Georgi Gerganov	71fdcf0616	ggml : arm repack fix build (whisper/0)	2025-12-14 08:33:51 +02:00
Congcong Cai	615655aafe	cmake : set `CMAKE_RUNTIME_OUTPUT_DIRECTORY` for non standalone build (ggml/1394) Some backend depends on CMAKE_RUNTIME_OUTPUT_DIRECTORY to create temporary file like metal backened. Missing CMAKE_RUNTIME_OUTPUT_DIRECTORY will cause some cmake error like permission denied (try to copy file to root). This PR wants to setup a default path for CMAKE_RUNTIME_OUTPUT_DIRECTORY when it does not exist.	2025-12-14 08:33:51 +02:00
ddh0	d1e5c60442	add missing values to `common_params_sampling::print()`	2025-12-13 23:26:03 -06:00
ddh0	965bcc9dc4	fix leftover `window_size`	2025-12-13 22:19:15 -06:00
ddh0	b8a9626a73	oops forgot args.cpp	2025-12-13 22:17:08 -06:00
ddh0	a96ddd743a	re-write + change parameters + simplify	2025-12-13 22:15:03 -06:00
ddh0	67a733670e	Merge branch 'ggml-org:master' into power-law-sampler	2025-12-13 17:27:35 -06:00
Xuan-Son Nguyen	c00ff929dc	scripts: add script to compare logprobs of llama.cpp against other frameworks (#17947 ) * scripts: add script to compare logits of llama.cpp against other frameworks * accept custom prompt file * fix code style * clarify endpoint * fix displaying * use abs for diff * fix vllm case * rm output file * rename to compare-logprobs * add "pattern"	2025-12-13 22:33:29 +01:00
Sergey Fedorov	4ed2bae50d	server-models.cpp: add missing <filesystem> (#18000 ) Fixes: https://github.com/ggml-org/llama.cpp/issues/17999	2025-12-13 22:02:43 +01:00
Jeff Bolz	5266379bca	llama_context: synchronize before reallocating output buffer (#17974 )	2025-12-13 09:19:51 -06:00
Xuan-Son Nguyen	4d5ae24c0a	arg: fix common_params_parse not accepting negated arg (#17991 )	2025-12-13 12:53:37 +01:00
Gustavo Rocha Dias	66ba51252e	cmake: correct scope - link ws2_32 for MinGW/w64devkit builds in cpp-httplib (#17972 ) * fix - w64devkit build * fix - w64devkit build private scope	2025-12-13 12:46:36 +01:00
Jeff Bolz	36255a2268	vulkan: support get_rows for i32 (#17941 )	2025-12-13 10:12:53 +01:00
Jeff Bolz	3229a23fa6	vulkan: support GGML_OP_DIAG (#17893 )	2025-12-13 10:07:49 +01:00
Jeff Bolz	303f8615e9	vulkan: Multi-pass softmax for large number of cols (#17892 ) When the number of cols is large, split each row across multiple workgroups. There are three phases that communicate partial results through temp buffers: (1) compute max partials (2) take max of partials, compute sum(exp(x-max)) partials (3) sum partials, compute scaled result	2025-12-13 10:04:29 +01:00
Georgi Gerganov	3c6391e748	speculative-simple : free batch on exit (#17985 )	2025-12-13 09:48:34 +02:00
Sigbjørn Skjæret	8e4d678528	common : skip model validation when --completion-bash is requested (#17975 )	2025-12-13 08:40:50 +01:00
Jeff Bolz	07a10c1090	vulkan: Allow non-pow2 n_experts in topk_moe (#17872 )	2025-12-13 08:40:04 +01:00
Sigbjørn Skjæret	2bc94e7928	add llama-completion to completion-bash executables (#17976 )	2025-12-13 08:35:50 +01:00
Daniel Bevenius	fd1085ffb7	model-conversion : use CONVERTED_MODEL value for converted model [no ci] (#17984 ) * model-conversion : use CONVERTED_MODEL value for converted model [no ci] This commit updates the model verification scripts to use the CONVERTED_MODEL environment variable instead of using the MODEL_PATH (the original model path) as the basis for the converted model file name. The motivation for this that currently if the converted model file name differs from the original model directory/name the verification scripts will look for the wrong .bin files that were generating when running the models. For example, the following steps were not possible: ```console (venv) $ huggingface-cli download google/gemma-3-270m-it --local-dir ggml-org/gemma-3-270m (venv) $ python3 convert_hf_to_gguf.py ggml-org/gemma-3-270m --outfile test-bf16.gguf --outtype bf16 (venv) $ cd examples/model-conversion/ (venv) $ export MODEL_PATH=../../ggml-org/gemma-3-270m (venv) $ export CONVERTED_MODEL=../../test-bf16.gguf (venv) $ make causal-verify-logits ... Data saved to data/llamacpp-test-bf16.bin Data saved to data/llamacpp-test-bf16.txt Error: llama.cpp logits file not found: data/llamacpp-gemma-3-270m.bin Please run scripts/run-converted-model.sh first to generate this file. make: *** [Makefile:62: causal-verify-logits] Error 1 ``` With the changes in this commit, the above steps will now work as expected.	2025-12-13 08:34:26 +01:00
ddh0	1879fc6dc6	Merge branch 'ggml-org:master' into power-law-sampler	2025-12-13 01:17:53 -06:00
ddh0	824bb3aa6e	fix compiler warning, add commented-out logging per token	2025-12-13 00:23:15 -06:00
ddh0	0a19a3fd6c	remove old debug log, style nit	2025-12-12 23:45:45 -06:00
ddh0	94cb883ed9	copy from author ref: https://gist.github.com/MrJackSpade/9be99c7efbba7b95a41377e123b7b069	2025-12-12 23:19:08 -06:00
ddh0	53380c183f	add missing parameters in `server-task.cpp`	2025-12-12 22:39:51 -06:00
Xuan-Son Nguyen	380b4c984e	common: support negated args (#17919 ) * args: support negated args * update docs * fix typo * add more neg options * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * rm duplicated arg * fix LLAMA_ARG_NO_HOST * add test --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-12-12 23:58:53 +01:00
Xuan-Son Nguyen	e39a2ce66d	clip: move model cgraphs into their own files (#17965 ) * clip: move model cgraphs into their own files * more explicit enums * fix linux build * fix naming * missing headers * nits: add comments for contributors	2025-12-12 21:14:48 +01:00
jiahao su	a8c7f33d79	ci : change the cann version and the container pull method (#17953 ) fix error format Update build.yml Remove unnecessary zip files fix update	2025-12-12 20:43:00 +01:00
Sigbjørn Skjæret	b7f5f46e03	docker : include legacy llama-completion binary (#17964 )	2025-12-12 19:39:23 +01:00
Johannes Gäßler	482211438d	CUDA: fix overflow in MMA kernel without stream-k (#17939 )	2025-12-12 17:43:58 +01:00
Georgi Gerganov	7bed317f53	models : fix the attn_factor for mistral3 graphs + improve consistency (#17945 ) * models : fix the attn_factor for mistral3 graphs * cont : rework attn_factor correction logic * cont : make deepseek2 consistent * cont : add TODO * cont : special-case DSv2 * cont : revert Mistral 3 Large changes * cont : fix DS2 to use the original attn_factor * cont : minor comments	2025-12-12 17:12:40 +02:00

1 2 3 4 5 ...

7435 Commits All Branches Search

7435 Commits

All Branches