llama.cpp

Commit Graph

Author	SHA1	Message	Date
Ed Addario	8d0e276f96	Update README.md	2025-08-31 14:56:10 +01:00
Ed Addario	8f1aa7885e	Remove activation_statistics() option	2025-08-31 14:03:19 +01:00
Ed Addario	70dd25b229	Merge branch 'master' into imatrix	2025-08-30 10:16:30 +01:00
Sergey Alirzaev	d82f6aa34a	server : removed obsolete doc (#15670 ) completing `a4090d1174`	2025-08-30 00:12:53 +02:00
ExtReMLapin	792b44f2ed	server : add documentation for `parallel_tool_calls` param (#15647 ) Co-authored-by: Pierre F <no@p.e>	2025-08-29 20:25:40 +03:00
Sigbjørn Skjæret	84ab83cc0b	model : jina-embeddings-v3 support (#13693 ) * initial jina-embeddings-v3 support * initial jina-embeddings-v3 support * initial jina-embeddings-v3 support * fix vocab parsing with only tokenizer.json * set mask token lstrip attribute * additional unk_token_id fallback just in case [no ci] * revert vocab_size() change [no ci] * merge tensor loading into general bert * rope * add lora embedding and loading (non-functional) * export separate lora ggufs instead * add adapter metadata api * use std::string * convert_hf_to_lora compatibility * fix assert * apply suggestions from review * apply suggestion from review	2025-08-28 15:49:50 +02:00
Joshua Cogliati	d35a1e8c41	cli : change log to warning to explain reason for stopping (#15604 ) * Change to warn instead of debug, to explain reason for stopping. * Update tools/main/main.cpp Fix printing --2 Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-08-28 10:48:20 +03:00
Johannes Gäßler	fbef0fad7a	server: higher timeout for tests (#15621 )	2025-08-27 20:58:09 +02:00
Ed Addario	6371902f98	Add --output-format to usage	2025-08-26 21:53:54 +01:00
Ed Addario	69b351b777	Add --output-format to usage	2025-08-26 21:53:01 +01:00
fidoriel	8ce3ff1d91	mtmd : fix mtmd ios build (#15579 )	2025-08-26 20:05:50 +02:00
Georgi Gerganov	b3964c1e89	metal : optimize FA vec for large sequences and BS <= 8 (#15566 ) * metal : optmize FA vec for large heads and sequences * metal : adjust small-batch mul mv kernels ggml-ci * batched-bench : fix total speed computation ggml-ci * cont : add comments ggml-ci	2025-08-26 14:22:14 +03:00
Xuan-Son Nguyen	79a546220c	mtmd : support Kimi VL model (#15458 ) * convert : fix tensor naming conflict for llama 4 vision * convert ok * support kimi vision model * clean up * fix style * fix calc number of output tokens * refactor resize_position_embeddings * add test case * rename build fn * correct a small bug	2025-08-26 12:54:19 +02:00
tc-mb	c4e9239064	model : support MiniCPM-V 4.5 (#15575 )	2025-08-26 10:05:55 +02:00
Georgi Gerganov	6b64f74b55	batched-bench : fix unified KV cache handling + pp timing (#15562 ) * batched-bench : fix unified KV cache handling + pp timing * cont : run dummy token only with split KV cache	2025-08-25 13:56:43 +03:00
Ed Addario	3e26364a00	Clarify the nature of the calculated cosine similarity	2025-08-24 20:22:22 +01:00
Georgi Gerganov	9ebebef62f	llama : remove KV cache defragmentation logic (#15473 ) ggml-ci	2025-08-22 12:22:13 +03:00
65a	4afb0a746f	server : Support multimodal completion and embeddings prompts in JSON format (#15108 ) - Use server_tokens in more places in server and util.cpp - Convert most functions that used llama_tokens to server_tokens - Modify input tokenizer to handle JSON objects as subprompts - Break out MTMD prompt parsing into utility function - Support JSON objects with multimodal_data arrays for MTMD prompts along with other existing types - Add capability to model endpoint to indicate if client can send multimodal data - Add tests.	2025-08-22 10:10:14 +02:00
Tarek Dakhran	e288693669	readme : model : mtdm : lfm2 improvements (#15476 ) * Support untied embeddings * Increase number of image tokens to 1024 * Add LFM2-VL to readme * Actually use untied embeddings	2025-08-22 09:29:08 +02:00
Ed Addario	5aca2561a1	Merge branch 'master' into imatrix	2025-08-21 22:19:54 +01:00
Michael Giba	b108e42904	ci : fix -Werror=return-type in clip.cpp so ci/run.sh can run without issue (#15221 ) * Fix -Werror=return-type so ci/run.sh can run * Update tools/mtmd/clip.cpp Co-authored-by: Diego Devesa <slarengh@gmail.com> * Remove false now that we have abort --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>	2025-08-21 12:06:46 +02:00
stduhpf	1b0db8f6e0	server : fix webui (#15462 ) * Fix webui crash after streaming * build webui	2025-08-21 08:19:22 +03:00
teo	1bc664a26a	server: fix OpenAI API compatibility for usage statistics in chat streams (#15444 )	2025-08-21 00:10:08 +02:00
xiaobing318	1a99c2d948	cmake : fix target include directories (#15450 ) * Update docker.yml 修改docker.yml文件中的内容使其停止周期性的运行该workflow，如果想要运行该workflow可以手动启动 * feat:Modify the header file include path 1. There's no llava directory in the tools directory. 2. Because the command `target_include_directories(mtmd PUBLIC .)` is used in the `mtmd` CMakeLists.txt file, other targets that link against `mtmd` automatically include the `mtmd` directory as a search path for header files. Therefore, you can remove `target_include_directories(${TARGET} PRIVATE ../llava`` or use `target_include_directories(${TARGET} PRIVATE ../mtmd`` to explicitly require the `llama-server` target to use header files from `mtmd`. * Restore the docker.yml file	2025-08-20 13:32:05 +03:00
Georgi Gerganov	d2fcd91cf9	server : disable context shift by default (#15416 ) * server : disable context shift by default ggml-ci * server : make scopr of test parameters local	2025-08-19 16:46:37 +03:00
Georgi Gerganov	f0d3c7405c	batched-bench : use rand tokens (#15398 )	2025-08-19 08:45:12 +03:00
Xuan-Son Nguyen	f08c4c0d8d	mtmd : clean up clip_n_output_tokens (#15391 )	2025-08-18 22:53:52 +02:00
Sigbjørn Skjæret	baa9255a45	llama : merge conts and reshapes and remove unnecessary cont (#15380 ) * remove unnecessary conts and merge reshapes * restore necessary conts * merge more conts and reshapes * merge even more conts and reshapes	2025-08-18 19:30:17 +02:00
davidef	d1d8241600	server : fix incoming tasks not process in order (#15395 )	2025-08-18 17:51:42 +03:00
Oleksandr Kuvshynov	e5155e6986	server : export max observed n_past value (#15361 ) Add tracking for high watermark cache usage and make it available in /metrics endpoint. Use-case: Tracking largest needed cache usage under realistic workload to better understand memory requirements and be able to adjust cache size/quantization for model/cache accordingly.	2025-08-18 00:28:58 +02:00
Ed Addario	630750fdef	Validate number of elements if in_sum is present	2025-08-17 09:42:18 +01:00
Ed Addario	1f72bc157f	Avoid using if statements with initialiser	2025-08-17 08:35:17 +01:00
Ed Addario	f6934b9417	Merge branch 'imatrix' of https://github.com/EAddario/llama.cpp into imatrix	2025-08-17 08:20:18 +01:00
Ed Addario	44ea7ddeac	Change statement order	2025-08-17 08:20:03 +01:00
Ed Addario	2e803234f4	Use { and } around conditionally-executed single line statements	2025-08-17 08:19:02 +01:00
Ed Addario	a96013f720	Define one variable per line and refactor names	2025-08-17 08:16:41 +01:00
Ed Addario	12607d3203	Use { and } around single line for statement	2025-08-17 08:10:54 +01:00
Ed Addario	d19e6c9afa	Use { and } around the conditionally-executed statement Co-authored-by: compilade <git@compilade.net>	2025-08-17 08:08:26 +01:00
Ed Addario	97d839c441	Using one line per variable definition Co-authored-by: compilade <git@compilade.net>	2025-08-17 08:06:15 +01:00
Ed Addario	4a487ea7e4	Use { and } around the conditionally-executed statement Co-authored-by: compilade <git@compilade.net>	2025-08-17 07:26:16 +01:00
Ed Addario	e3149a2168	Use the corresponding size Co-authored-by: compilade <git@compilade.net>	2025-08-17 07:24:27 +01:00
Tarek Dakhran	65349f26f2	model : support vision LiquidAI LFM2-VL family (#15347 ) * wip lfm2 vision model * Fix conv weight * Implement dynamic resolution * Fix cuda * support LFM2-VL-450M * happy CI * Remove extra `ggml_conv` and put others into the right place Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-08-16 23:33:54 +02:00
Ed Addario	d4b0d89115	Fix return type bug	2025-08-16 11:00:43 +01:00
Ed Addario	030ec53d7a	Remove unnecessary include	2025-08-16 10:46:09 +01:00
Ed Addario	8589ef4d15	Update README.md	2025-08-15 21:27:48 +01:00
Ed Addario	240a965e50	Update README.md	2025-08-15 21:24:38 +01:00
Ed Addario	42bfe3b2a3	Update stats output sort based on imatrix type	2025-08-15 21:12:56 +01:00
Ed Addario	2756617c3f	Merge branch 'master' into imatrix	2025-08-15 20:46:43 +01:00
Diego Devesa	f75b830647	chat : include kwargs in template example (#15309 )	2025-08-14 10:28:29 -07:00
Aldehir Rojas	b204a5a234	gpt-oss: implement harmony parsing (#15181 ) * model : add harmony parser for gpt-oss * gpt-oss : fix grammar trigger from causing empty stack * gpt-oss: tweak the grammar trigger again * gpt-oss : add support for recipient in role header * gpt-oss : fix ungrouped tool calls in grammar * gpt-oss : loosen function name matching during parse * gpt-oss : clean up workarounds * gpt-oss : add template tests * gpt-oss : simulate thinking and tool call tags * gpt-oss : undo think tags when reasoning_format is none * gpt-oss : set special tokens back to user defined * gpt-oss : update openai-gpt-oss template * server : filter out harmony thought messages * gpt-oss : simplify parsing	2025-08-14 17:23:11 +03:00

1 2 3 4 5

248 Commits