llama.cpp

Commit Graph

Author	SHA1	Message	Date
HelloKS	9d52f17ae3	model : add KORMo model (#18032 ) * vocab: add KORMo Tokenizer * model: add KORMoForCausalLM * vocab: change pretokenizer to qwen2 * lint: fix unintended line removal * model: make qwen2 bias tensor optional * model: use qwen2 architecture for KORMo	2025-12-15 18:51:43 +01:00
ssweens	4529c660c8	kv-cache: Fix state restore fragmented cache (#17982 ) * kv-cache : fix state restore with fragmented cache (#17527) Change find_slot to allow non-contiguous allocation during state restore. Fixes 'failed to find available cells in kv cache' error when restoring state to fragmented cache. * tests : update logic * cleanup: tightened state_read_meta sig, added is_contiguous case * fix: state_read_meta arg reorder loose ends --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-12-15 19:28:35 +02:00
Pascal	0f4f35e7be	Fix unreadable user markdown colors and truncate long texts in deletion dialogs (#17555 ) * webui: limit conversation name length in dialogs * webui: fix unreadable colors on links and table cell hover in user markdown * webui: keep table borders visible in user markdown * webui: updating unified exports * Update tools/server/webui/src/lib/components/app/chat/ChatAttachments/ChatAttachmentThumbnailFile.svelte Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * chore: update webui build output * chore: update webui build output * chore: update webui build output --------- Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>	2025-12-15 16:34:53 +01:00
Jeremy Demeule	165caaf5fb	metal: use shared buffers on eGPU (#17866 ) * metal: use shared buffers on eGPU With #15906, I noticed on important regression when using metal backend on eGPU. This commit restore the previous behavior and add an option to force its activation. * metal: use shared buffers on eGPU * metal: use shared buffers on eGPU	2025-12-15 16:14:49 +02:00
Xuan-Son Nguyen	96a181a933	mtmd: refactor audio preprocessing (#17978 ) * mtmd: refactor audio preprocessing * refactor Co-authored-by: Tarek <tdakhran@users.noreply.github.com> * wip * wip (2) * improve constructor * fix use_natural_log * fix padding for short input * clean up * remove need_chunking --------- Co-authored-by: Tarek <tdakhran@users.noreply.github.com>	2025-12-15 14:16:52 +01:00
Andrew Aladjev	4a4f7e6550	cli: fixed dead links to tools/main for cli and completion, fixed code owners (#17993 ) Co-authored-by: Andrew Aladjev <andrew.aladjev@gmail.com>	2025-12-15 11:47:04 +01:00
Thomas Jarosch	e73d548659	webui: add "delete all conversations" button to import/export tab (#17444 ) * webui: add "delete all conversations" button to import/export tab - Add 'Delete all conversations' functionality with confirmation dialog - Add Trash icon and destructive styling for clear visual indication - Redirects to "?new_chat=true#/" by using conversationsStore.deleteAll() * chore: update webui build output	2025-12-15 11:29:29 +01:00
Johannes Gäßler	b1f3a6e5db	llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization (#16653 ) * llama: automatically fit args to free memory llama-fit-params tool * fix CI * hints for bug reports, ensure no reallocation * fix segfault with Vulkan * add llama-fit-params to CI * fix CI * fix CI * fix CI * minor adjustments * fix assignment of 1 dense layer * fix logger not being reset on model load failure * remove --n-gpu-layer hint on model load failure * fix llama-fit-params verbosity * fix edge case * fix typo [no ci]	2025-12-15 09:24:59 +01:00
Neo Zhang Jianyu	4aced7a631	[SYCL] Support gpt-oss by OPs add-id, mul_mat for mxfp4, swiglu_oai (#17826 ) * support gpt-oss GPU by OP add-id, mul_mat for mxfp4, swiglu_oai, fix warning * fix fault ut case, update ops.md * rebase, fix format issue	2025-12-15 10:35:15 +08:00
piDack	745fa0e78b	model : add glm-asr support (#17901 ) * [model] add glm-asr support * fix format for ci * fix convert format for ci * update glm_asr convert script & use build_ffn for glm_asr clip & use build_stack for padding and review * check root architecture for convert hf script * fix conficlt with upstream * fix convert script for glm asr & format clip-impl * format * restore hparams text * improved conversion --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-12-15 03:18:46 +01:00
Xuan-Son Nguyen	52392291b2	preset: handle negated arg, reverse the meaning if needed (#18041 )	2025-12-14 22:08:10 +01:00
Sigbjørn Skjæret	5c8a717128	convert : refactor rope scaling handling (#18013 ) * refactor rope scaling handling * ws-- * missed a couple * use find_hparam	2025-12-14 16:04:37 +01:00
Haowei Wu	37f5a1093b	mtmd: enhance image resizing in llava_uhd (#18014 )	2025-12-14 15:57:52 +01:00
Ruben Ortlam	9e6649ecf2	vulkan: fix mul_mat_vec_iq1_s formatting (#18026 )	2025-12-14 14:52:46 +01:00
Xuan-Son Nguyen	0759b09c90	graph: add f_attn_temp_offset (#18025 )	2025-12-14 13:05:59 +01:00
Georgi Gerganov	254098a279	common : refactor common_sampler + grammar logic changes (#17937 ) * common : refactor common_sampler + grammar logic changes * tests : increase max_tokens to get needed response * batched : fix uninitialized samplers	2025-12-14 10:11:13 +02:00
Jeff Bolz	3238b1400c	vulkan: Fix data race/hang in scalar/cm1 flash attention (#17887 )	2025-12-14 09:00:00 +01:00
lovedheart	4722671641	vulkan: improve mul_mat_vec_iq1_s speed (#17874 )	2025-12-14 08:47:49 +01:00
Eve	d15d177f43	vulkan: faster q6_k matmul (#17813 ) * q6_k faster mul mat * 8 values * fix comment * switch to two at a time * start ci for .glsl files	2025-12-14 08:29:37 +01:00
Georgi Gerganov	77ad8542bd	model-conversion : cast logits to float32 (#18009 )	2025-12-14 08:58:13 +02:00
Georgi Gerganov	609a2d0268	models : fix YaRN regression + consolidate logic (#18006 ) * models : fix YaRN regression + consolidate logic * cont : fix the fix * cont : remove header * cont : add header	2025-12-14 08:34:56 +02:00
Georgi Gerganov	a63cbafbbc	ggml : arm repack fix build	2025-12-14 08:33:51 +02:00
Georgi Gerganov	0e59224990	sync : ggml	2025-12-14 08:33:51 +02:00
Georgi Gerganov	71fdcf0616	ggml : arm repack fix build (whisper/0)	2025-12-14 08:33:51 +02:00
Congcong Cai	615655aafe	cmake : set `CMAKE_RUNTIME_OUTPUT_DIRECTORY` for non standalone build (ggml/1394) Some backend depends on CMAKE_RUNTIME_OUTPUT_DIRECTORY to create temporary file like metal backened. Missing CMAKE_RUNTIME_OUTPUT_DIRECTORY will cause some cmake error like permission denied (try to copy file to root). This PR wants to setup a default path for CMAKE_RUNTIME_OUTPUT_DIRECTORY when it does not exist.	2025-12-14 08:33:51 +02:00
Xuan-Son Nguyen	c00ff929dc	scripts: add script to compare logprobs of llama.cpp against other frameworks (#17947 ) * scripts: add script to compare logits of llama.cpp against other frameworks * accept custom prompt file * fix code style * clarify endpoint * fix displaying * use abs for diff * fix vllm case * rm output file * rename to compare-logprobs * add "pattern"	2025-12-13 22:33:29 +01:00
Sergey Fedorov	4ed2bae50d	server-models.cpp: add missing <filesystem> (#18000 ) Fixes: https://github.com/ggml-org/llama.cpp/issues/17999	2025-12-13 22:02:43 +01:00
Jeff Bolz	5266379bca	llama_context: synchronize before reallocating output buffer (#17974 )	2025-12-13 09:19:51 -06:00
Xuan-Son Nguyen	4d5ae24c0a	arg: fix common_params_parse not accepting negated arg (#17991 )	2025-12-13 12:53:37 +01:00
Gustavo Rocha Dias	66ba51252e	cmake: correct scope - link ws2_32 for MinGW/w64devkit builds in cpp-httplib (#17972 ) * fix - w64devkit build * fix - w64devkit build private scope	2025-12-13 12:46:36 +01:00
Jeff Bolz	36255a2268	vulkan: support get_rows for i32 (#17941 )	2025-12-13 10:12:53 +01:00
Jeff Bolz	3229a23fa6	vulkan: support GGML_OP_DIAG (#17893 )	2025-12-13 10:07:49 +01:00
Jeff Bolz	303f8615e9	vulkan: Multi-pass softmax for large number of cols (#17892 ) When the number of cols is large, split each row across multiple workgroups. There are three phases that communicate partial results through temp buffers: (1) compute max partials (2) take max of partials, compute sum(exp(x-max)) partials (3) sum partials, compute scaled result	2025-12-13 10:04:29 +01:00
Georgi Gerganov	3c6391e748	speculative-simple : free batch on exit (#17985 )	2025-12-13 09:48:34 +02:00
Sigbjørn Skjæret	8e4d678528	common : skip model validation when --completion-bash is requested (#17975 )	2025-12-13 08:40:50 +01:00
Jeff Bolz	07a10c1090	vulkan: Allow non-pow2 n_experts in topk_moe (#17872 )	2025-12-13 08:40:04 +01:00
Sigbjørn Skjæret	2bc94e7928	add llama-completion to completion-bash executables (#17976 )	2025-12-13 08:35:50 +01:00
Daniel Bevenius	fd1085ffb7	model-conversion : use CONVERTED_MODEL value for converted model [no ci] (#17984 ) * model-conversion : use CONVERTED_MODEL value for converted model [no ci] This commit updates the model verification scripts to use the CONVERTED_MODEL environment variable instead of using the MODEL_PATH (the original model path) as the basis for the converted model file name. The motivation for this that currently if the converted model file name differs from the original model directory/name the verification scripts will look for the wrong .bin files that were generating when running the models. For example, the following steps were not possible: ```console (venv) $ huggingface-cli download google/gemma-3-270m-it --local-dir ggml-org/gemma-3-270m (venv) $ python3 convert_hf_to_gguf.py ggml-org/gemma-3-270m --outfile test-bf16.gguf --outtype bf16 (venv) $ cd examples/model-conversion/ (venv) $ export MODEL_PATH=../../ggml-org/gemma-3-270m (venv) $ export CONVERTED_MODEL=../../test-bf16.gguf (venv) $ make causal-verify-logits ... Data saved to data/llamacpp-test-bf16.bin Data saved to data/llamacpp-test-bf16.txt Error: llama.cpp logits file not found: data/llamacpp-gemma-3-270m.bin Please run scripts/run-converted-model.sh first to generate this file. make: *** [Makefile:62: causal-verify-logits] Error 1 ``` With the changes in this commit, the above steps will now work as expected.	2025-12-13 08:34:26 +01:00
Xuan-Son Nguyen	380b4c984e	common: support negated args (#17919 ) * args: support negated args * update docs * fix typo * add more neg options * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * rm duplicated arg * fix LLAMA_ARG_NO_HOST * add test --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-12-12 23:58:53 +01:00
Xuan-Son Nguyen	e39a2ce66d	clip: move model cgraphs into their own files (#17965 ) * clip: move model cgraphs into their own files * more explicit enums * fix linux build * fix naming * missing headers * nits: add comments for contributors	2025-12-12 21:14:48 +01:00
jiahao su	a8c7f33d79	ci : change the cann version and the container pull method (#17953 ) fix error format Update build.yml Remove unnecessary zip files fix update	2025-12-12 20:43:00 +01:00
Sigbjørn Skjæret	b7f5f46e03	docker : include legacy llama-completion binary (#17964 )	2025-12-12 19:39:23 +01:00
Johannes Gäßler	482211438d	CUDA: fix overflow in MMA kernel without stream-k (#17939 )	2025-12-12 17:43:58 +01:00
Georgi Gerganov	7bed317f53	models : fix the attn_factor for mistral3 graphs + improve consistency (#17945 ) * models : fix the attn_factor for mistral3 graphs * cont : rework attn_factor correction logic * cont : make deepseek2 consistent * cont : add TODO * cont : special-case DSv2 * cont : revert Mistral 3 Large changes * cont : fix DS2 to use the original attn_factor * cont : minor comments	2025-12-12 17:12:40 +02:00
Sigbjørn Skjæret	dcb7d17758	cann : fix ops broken by circular padding guard (#17825 )	2025-12-12 15:49:27 +01:00
ixgbe	51604435e8	ggml-cpu : fix RISC-V Q4_0 repack select and RVV feature reporting (#17951 ) * ggml-cpu:fix RISC-V Q4_0 repack select and RVV feature reporting Signed-off-by: Wang Yang <yangwang@iscas.ac.cn> * using the name VLEN instead of CNT * Update ggml/include/ggml-cpu.h --------- Signed-off-by: Wang Yang <yangwang@iscas.ac.cn> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-12-12 16:26:03 +02:00
Xuan-Son Nguyen	17158965ac	mtmd: explicitly forbidden inclusion of private header and libcommon (#17946 )	2025-12-12 15:16:06 +01:00
Aleksander Grygier	12280ae905	webui: Fix parsing non-LaTeX occurrencies of `$` or `$` (#17810 ) * fix: Improve latex protection logic to prevent turning non-latex `\(` into `$` * chore: update webui build output	2025-12-12 15:13:36 +01:00
Xuan-Son Nguyen	54a0fee4b7	arg: add -mm and -mmu as short form of --mmproj and --mmproj-url (#17958 ) * arg: add -mm and -mmu as short form of --mmproj and --mmproj-url * correct order * update docs	2025-12-12 14:06:06 +01:00
Daniel Bevenius	dada4c846d	model-conversion : remove max diff check in compare-logits [no ci] (#17954 ) This commit removes the maximum difference check from the compare-logits.py which would stop early if the difference between the logits exceeded a threshold. The motivation for removing this is that it can be useful to be able to get the complete log for debugging/reporting purposes.	2025-12-12 13:25:16 +01:00

1 2 3 4 5 ...

7414 Commits All Branches Search

7414 Commits

All Branches