llama.cpp

Commit Graph

Author	SHA1	Message	Date
Ed Addario	5aca2561a1	Merge branch 'master' into imatrix	2025-08-21 22:19:54 +01:00
Michael Giba	b108e42904	ci : fix -Werror=return-type in clip.cpp so ci/run.sh can run without issue (#15221 ) * Fix -Werror=return-type so ci/run.sh can run * Update tools/mtmd/clip.cpp Co-authored-by: Diego Devesa <slarengh@gmail.com> * Remove false now that we have abort --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>	2025-08-21 12:06:46 +02:00
stduhpf	1b0db8f6e0	server : fix webui (#15462 ) * Fix webui crash after streaming * build webui	2025-08-21 08:19:22 +03:00
teo	1bc664a26a	server: fix OpenAI API compatibility for usage statistics in chat streams (#15444 )	2025-08-21 00:10:08 +02:00
xiaobing318	1a99c2d948	cmake : fix target include directories (#15450 ) * Update docker.yml 修改docker.yml文件中的内容使其停止周期性的运行该workflow，如果想要运行该workflow可以手动启动 * feat:Modify the header file include path 1. There's no llava directory in the tools directory. 2. Because the command `target_include_directories(mtmd PUBLIC .)` is used in the `mtmd` CMakeLists.txt file, other targets that link against `mtmd` automatically include the `mtmd` directory as a search path for header files. Therefore, you can remove `target_include_directories(${TARGET} PRIVATE ../llava`` or use `target_include_directories(${TARGET} PRIVATE ../mtmd`` to explicitly require the `llama-server` target to use header files from `mtmd`. * Restore the docker.yml file	2025-08-20 13:32:05 +03:00
Georgi Gerganov	d2fcd91cf9	server : disable context shift by default (#15416 ) * server : disable context shift by default ggml-ci * server : make scopr of test parameters local	2025-08-19 16:46:37 +03:00
Georgi Gerganov	f0d3c7405c	batched-bench : use rand tokens (#15398 )	2025-08-19 08:45:12 +03:00
Xuan-Son Nguyen	f08c4c0d8d	mtmd : clean up clip_n_output_tokens (#15391 )	2025-08-18 22:53:52 +02:00
Sigbjørn Skjæret	baa9255a45	llama : merge conts and reshapes and remove unnecessary cont (#15380 ) * remove unnecessary conts and merge reshapes * restore necessary conts * merge more conts and reshapes * merge even more conts and reshapes	2025-08-18 19:30:17 +02:00
davidef	d1d8241600	server : fix incoming tasks not process in order (#15395 )	2025-08-18 17:51:42 +03:00
Oleksandr Kuvshynov	e5155e6986	server : export max observed n_past value (#15361 ) Add tracking for high watermark cache usage and make it available in /metrics endpoint. Use-case: Tracking largest needed cache usage under realistic workload to better understand memory requirements and be able to adjust cache size/quantization for model/cache accordingly.	2025-08-18 00:28:58 +02:00
Ed Addario	630750fdef	Validate number of elements if in_sum is present	2025-08-17 09:42:18 +01:00
Ed Addario	1f72bc157f	Avoid using if statements with initialiser	2025-08-17 08:35:17 +01:00
Ed Addario	f6934b9417	Merge branch 'imatrix' of https://github.com/EAddario/llama.cpp into imatrix	2025-08-17 08:20:18 +01:00
Ed Addario	44ea7ddeac	Change statement order	2025-08-17 08:20:03 +01:00
Ed Addario	2e803234f4	Use { and } around conditionally-executed single line statements	2025-08-17 08:19:02 +01:00
Ed Addario	a96013f720	Define one variable per line and refactor names	2025-08-17 08:16:41 +01:00
Ed Addario	12607d3203	Use { and } around single line for statement	2025-08-17 08:10:54 +01:00
Ed Addario	d19e6c9afa	Use { and } around the conditionally-executed statement Co-authored-by: compilade <git@compilade.net>	2025-08-17 08:08:26 +01:00
Ed Addario	97d839c441	Using one line per variable definition Co-authored-by: compilade <git@compilade.net>	2025-08-17 08:06:15 +01:00
Ed Addario	4a487ea7e4	Use { and } around the conditionally-executed statement Co-authored-by: compilade <git@compilade.net>	2025-08-17 07:26:16 +01:00
Ed Addario	e3149a2168	Use the corresponding size Co-authored-by: compilade <git@compilade.net>	2025-08-17 07:24:27 +01:00
Tarek Dakhran	65349f26f2	model : support vision LiquidAI LFM2-VL family (#15347 ) * wip lfm2 vision model * Fix conv weight * Implement dynamic resolution * Fix cuda * support LFM2-VL-450M * happy CI * Remove extra `ggml_conv` and put others into the right place Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-08-16 23:33:54 +02:00
Ed Addario	d4b0d89115	Fix return type bug	2025-08-16 11:00:43 +01:00
Ed Addario	030ec53d7a	Remove unnecessary include	2025-08-16 10:46:09 +01:00
Ed Addario	8589ef4d15	Update README.md	2025-08-15 21:27:48 +01:00
Ed Addario	240a965e50	Update README.md	2025-08-15 21:24:38 +01:00
Ed Addario	42bfe3b2a3	Update stats output sort based on imatrix type	2025-08-15 21:12:56 +01:00
Ed Addario	2756617c3f	Merge branch 'master' into imatrix	2025-08-15 20:46:43 +01:00
Diego Devesa	f75b830647	chat : include kwargs in template example (#15309 )	2025-08-14 10:28:29 -07:00
Aldehir Rojas	b204a5a234	gpt-oss: implement harmony parsing (#15181 ) * model : add harmony parser for gpt-oss * gpt-oss : fix grammar trigger from causing empty stack * gpt-oss: tweak the grammar trigger again * gpt-oss : add support for recipient in role header * gpt-oss : fix ungrouped tool calls in grammar * gpt-oss : loosen function name matching during parse * gpt-oss : clean up workarounds * gpt-oss : add template tests * gpt-oss : simulate thinking and tool call tags * gpt-oss : undo think tags when reasoning_format is none * gpt-oss : set special tokens back to user defined * gpt-oss : update openai-gpt-oss template * server : filter out harmony thought messages * gpt-oss : simplify parsing	2025-08-14 17:23:11 +03:00
Georgi Gerganov	d32e03f449	server : add SWA checkpoints (#15293 ) * server : add SWA checkpoints ggml-ci * cont : server clean-up * server : handle state restore fails * llama : add extended llama_state_seq_ API * server : do not make checkpoints if --swa-full ggml-ci * llama : remove flags value for NONE * server : configure number of SWA checkpoints with CLI arg ggml-ci * args : fix scope of new argument	2025-08-14 14:59:50 +03:00
kallewoof	3ea913f1ce	perplexity: give more information about constraints on failure (#15303 ) * perplexity: give more information about constraints on failure This checks whether -np is insufficient vs context, and provides clues as to how much is needed for each. * log formatting * log error and return instead of storing max_seq_exceeded int * check if s0 is zero for -np check	2025-08-14 09:16:32 +03:00
Sigbjørn Skjæret	b3e16665e1	server : enable -td and -tbd parameters (#15172 )	2025-08-13 15:43:00 +02:00
Copilot	d8914fc47e	common : add --override-tensor-draft, --cpu-moe-draft and --n-cpu-moe-draft parameters (#15191 ) * Checkpoint from VS Code for coding agent session * Initial plan * Fix typo in --override-tensor-draft flag implementation * Add null termination for speculative tensor buffer overrides * Apply suggestions from code review * Apply suggestions from code review * Extract tensor override parsing logic to common function (addresses @slaren's feedback) * Apply suggestions from code review * Apply suggestions --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Diego Devesa <slarengh@gmail.com>	2025-08-13 12:44:40 +02:00
Aldehir Rojas	e885445bc1	server : filter out harmony thought messages (#15278 )	2025-08-13 12:28:21 +02:00
rainred	cf9e5648a7	mtmd : Fix MinicpmV model converter and clip to avoid using hardcode. (#14750 ) * Fix MinicpmV model converter and clip to avoid using hardcode. * Code update for pr/14750 * Remove unused field, update script path in docs. * Add version 5 for fallback code. --------- Co-authored-by: lzhang <zhanglei@modelbest.cn>	2025-08-11 16:12:12 +02:00
Xuan-Son Nguyen	53d0a12658	server : allow specifying reasoning_format in HTTP request (#15238 )	2025-08-11 14:48:41 +02:00
Daniel Bevenius	1ebbaddff2	perplexity : update comments/error msg to use decode [no ci] (#15227 ) This commit updates comments and error messages to use "decode" instead of "eval" in perplexity.cpp. The motivation for this is that `llama_eval` was renamed to `llama_decode` a while ago, but the comments and error messages still referred to "eval". This change ensures consistency and clarity.	2025-08-11 11:21:24 +03:00
Ed Addario	89051cda35	Update README.md	2025-08-09 14:49:44 +01:00
Ed Addario	dcac206f8e	Add --activation-statistics logic to avoid doubling the imatrix size by default	2025-08-09 14:49:25 +01:00
Ed Addario	6fe51e12f1	Fix typo in ECS formula	2025-08-09 09:12:23 +01:00
Ed Addario	59af5034f7	Update README.md	2025-08-09 01:26:23 +01:00
Ed Addario	c5ecdaa1a1	Add Euclidean–Cosine Score (ECS)	2025-08-07 19:04:49 +01:00
Ed Addario	5bb2def02d	Add --activation-statistics parameter	2025-08-07 17:41:21 +01:00
Ed Addario	dadd90ef73	Rename report heading	2025-08-07 14:07:48 +01:00
Ed Addario	e0d6471340	Reverse conditional logic to match convention	2025-08-07 12:04:52 +01:00
Ed Addario	3e9d53c61e	Refactor variable names	2025-08-07 12:03:24 +01:00
Ed Addario	c7959edff5	Merge branch 'master' into imatrix	2025-08-07 11:51:33 +01:00
Daniel Bevenius	36d3f00e14	requirements : fix PyTorch uint64 compatibility (#15134 ) This commit addresses an issue with the convert_hf_to_gguf script which is currently failing with: ```console AttributeError: module 'torch' has no attribute 'uint64' ``` This occurred because safetensors expects torch.uint64 to be available in the public API, but PyTorch 2.2.x only provides limited support for unsigned types beyond uint8 it seems. The torch.uint64 dtype exists but is not exposed in the standard torch namespace (see pytorch/pytorch#58734). PyTorch 2.4.0 properly exposes torch.uint64 in the public API, resolving the compatibility issue with safetensors. This also required torchvision to updated to =0.19.0 for compatibility. Refs: https://huggingface.co/spaces/ggml-org/gguf-my-repo/discussions/186#68938de803e47d990aa087fb Refs: https://github.com/pytorch/pytorch/issues/58734	2025-08-07 05:31:48 +02:00

1 2 3 4 5

229 Commits