llama.cpp

Commit Graph

Author	SHA1	Message	Date
Aman Gupta	908a9e5a1e	CUDA: disable cuda graph when using n-cpu-moe (#18593 ) * CUDA: disable cuda graph when using n-cpu-moe * call ggml_cuda_set_device	2026-01-05 01:37:48 +08:00
Aman Gupta	5126c41c1c	ggml-cuda: remove unused params in ggml_cuda_graph (#18579 )	2026-01-05 01:37:09 +08:00
Aldehir Rojas	cef1d23c5a	common/grammar : replace problematic backtracking regex `[\s\S]` (#18342 ) grammar : add support for std::regex_search() with trigger patterns * common : update hermes2 pro trigger to search instead of match * common : use regex_search with anchoring for partial matching * common : adjust regex partial tests to use new pattern * grammar : check pattern directly instead of adding a type * common : adjust existing patterns to match new semantics	2026-01-03 16:02:43 -06:00
Georgi Gerganov	c69c7ebc90	graph : fix graph reuse logic when `n_pos_per_embd > 1` (#18566 )	2026-01-03 23:59:06 +02:00
Aman Gupta	e57f52334b	ggml-cuda: fixes for concurrent streams (#18496 )	2026-01-03 23:15:01 +08:00
Georgi Gerganov	a554a1ecc7	context : fix reserve token padding to n_seqs (#18536 )	2026-01-03 15:45:34 +02:00
Johannes Gäßler	0f2e42ca1d	CUDA: only allocate FA tmp buffer if needed (#18564 )	2026-01-03 13:55:53 +01:00
Imad Saddik	db8d1acd3a	chore: update webui build output	2026-01-03 11:46:18 +01:00
pl752	9dba9f5352	(Bugfix, ggml-cuda) Pool alloc count fix + small size computation type adjustment (#18559 ) * CUDA: Fixed obj byte size instead of obj count being passed to pool alloc (fattn-common, dst_tmp_meta) * CUDA: Explicitly casted some of the int alloc counts before multiplication in argsort --------- Co-authored-by: pl752 <maximpl752@gmail.com>	2026-01-03 11:13:40 +01:00
Imad Saddik	72af4199c4	Replace <label> with <Label>	2026-01-03 09:44:26 +01:00
Imad Saddik	7295444cd2	Fix autoChatWidth checkbox to reset customChatWidth when enabled	2026-01-03 09:41:02 +01:00
Imad Saddik	6487840b0a	Pass missing style prop	2026-01-03 09:39:45 +01:00
Imad Saddik	22731da153	chore: update webui build output	2026-01-03 09:23:16 +01:00
Imad Saddik	eb997f61f9	Add autoChatWidth and customChatWidth to syncable parameters	2026-01-03 09:21:55 +01:00
Imad Saddik	21b35be366	chore: update webui build output	2026-01-03 09:12:29 +01:00
Imad Saddik	c9b34bc00d	Replace getChatWidth utility with chatWidthClasses in chat components	2026-01-03 09:10:44 +01:00
Imad Saddik	b6536f6589	Format code	2026-01-03 08:41:30 +01:00
Imad Saddik	36f334f4af	Put the constant into constants/chat-width.ts	2026-01-03 08:40:21 +01:00
Imad Saddik	00e6cafda6	Rename component to ChatSettingsComboboxCustomWidth	2026-01-03 08:37:09 +01:00
Imad Saddik	0143112cf0	chore: update webui build output	2026-01-03 08:35:42 +01:00
Imad Saddik	d7528b41fa	Merge remote-tracking branch 'upstream/master' into feat/change_chat_screen_width	2026-01-03 08:34:29 +01:00
Shouyu	bcfc8c3cec	ggml-hexagon: optimize activation function (#18393 ) * refactor: refactor silu * refactor: optimize swiglu * refactor: remove unncessary if in swiglu * refactor: refactor swiglu_oai * chore: fix formatting issue	2026-01-02 21:24:24 -08:00
Jeff Bolz	18ddaea2ae	vulkan: Optimize GGML_OP_CUMSUM (#18417 ) * vulkan: Optimize GGML_OP_CUMSUM There are two paths: The preexisting one that does a whole row per workgroup in a single shader, and one that splits each row into multiple blocks and does two passes. The first pass computes partials within a block, the second adds the block partials to compute the final result. The multipass shader is used when there are a small number of large rows. In the whole-row shader, handle multiple elements per invocation. * use 2 ELEM_PER_THREAD for AMD/Intel * address feedback	2026-01-02 15:32:30 -06:00
Jeff Bolz	706e3f93a6	vulkan: Implement mmvq for iq1_s/iq1_m (#18450 )	2026-01-02 20:19:04 +01:00
Prabod	5755e52d15	model : Maincoder-1B support (#18534 ) * Add Maincoder model support * Removed SPM model vocabulary setting and MOE related GGUF parameters Removed trailing spaces from maincoder.cpp * removed set_vocab * added new line * Fix formatting * Add a new line for PEP8	2026-01-02 20:11:59 +01:00
Georgi Gerganov	f38de16341	metal : adjust extra size for FA buffer to avoid reallocations (#18545 )	2026-01-02 19:02:18 +02:00
Georgi Gerganov	af1e8e1a6c	graph : reduce topology branching (#18548 )	2026-01-02 19:01:56 +02:00
Georgi Gerganov	d84a6a98be	vocab : reduce debug logs about non-EOG control tokens (#18541 ) * vocab : reduce debug logs about non-EOG control tokens * cont : add comment	2026-01-02 16:17:33 +02:00
Chris Rohlf	c6f0e832da	rpc : use unordered_map::reserve and emplace (#18513 )	2026-01-02 12:09:36 +02:00
MeeMin	e86f3c2221	cuda : fix copy of large tensors (ggml_nbytes <= INT_MAX assertion) (#18433 ) * ggml-cuda: fixed assertion in ggml_cuda_cpy (#18140) * ggml-cuda: changes in data types to int64_t * ggml-cuda: added asserts for CUDA block numbers * ggml-cuda: changed the condition for y and z dimension	2026-01-02 00:24:20 +01:00
Sigbjørn Skjæret	169ee68ffb	model : remove modern-bert iswa template (#18529 ) * remove modern-bert iswa template * forgotten	2026-01-02 00:06:42 +01:00
tt	ced765be44	model: support youtu-vl model (#18479 ) * Support Youtu-VL Model * merge code * fix bug * revert qwen2 code & support rsplit in minja.hpp * update warm info * fix annotation * u * revert minja.hpp * fix * Do not write routed_scaling_factor to gguf when routed_scaling_factor is None * fix expert_weights_scale * LGTM after whitespace fixes * fix * fix * fix * layers to layer_index * enum fix --------- Co-authored-by: Xuan-Son Nguyen <son@huggingface.co> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-01-01 19:25:54 +01:00
Piotr Wilkin (ilintar)	3ccccc83f7	Add conversion support for IQuestCoderForCausalLM (#18524 )	2026-01-01 18:45:55 +01:00
o7si	d0a6a31470	model : add support for JinaBertModel with non-gated ffn (#18475 ) * WIP: Initial commit for fixing JinaBert original FF type support * convert: add jina-v2-de tokenizer variant for German_Semantic_V3 * convert: fix token collision in BERT phantom vocab conversion * convert: add feed_forward_type metadata * model: add feed_forward_type metadata for jina-bert-v2 * model: jina-bert-v2 support standard GELU FFN variant * model: remove ffn_type, detect FFN variant from tensor dimensions * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update src/models/bert.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update src/models/bert.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * revert collision fix to be handled in separate PR --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-01-01 18:38:51 +01:00
o7si	2b2afade9f	convert : fix encoding of WPM vocab for BERT models (#18500 ) * convert: avoid token collision when stripping ## prefix * convert: use token types for BERT special tokens check * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-01-01 18:27:07 +01:00
HelloKS	f4f5019254	model: add Solar Open model (#18511 ) * model: add Solar-Open model * vocab: add solar-open to end eog blacklist * model: add proper llm type * chat: basic template for solar open * typo: fix comment about vocab * convert: sugested changes * convert: suggested changes * chat: change reasoning end tag for solar-open * llama-chat: add solar-open template	2026-01-01 18:01:43 +01:00
Anri Lombard	d5574c919c	webui: fix code copy stripping XML/HTML tags (#18518 ) * webui: fix code copy stripping XML/HTML tags * webui: update static build	2026-01-01 13:44:11 +01:00
Aman Gupta	26831bded9	ggml-cuda: remove unneccesary prints on ggml_cuda_init (#18502 )	2026-01-01 19:18:43 +08:00
Jeff Bolz	be47fb9285	vulkan: extend topk_moe to handle sigmoid w/exp_probs_b for nemotron (#18295 ) * vulkan: extend topk_moe to handle sigmoid w/exp_probs_b for nemotron Also handle GGML_OP_SCALE at the end (nemotron, deepseek2). Fewer pipeline variants and spec constants, just use push constants. In test_topk_moe, change exp_probs_b to be 1D, matching real networks. Update test-backend-ops and ggml-backend to allow verifying multiple outputs in a fusion test (topk_moe has two outputs). Previously only the final node was verified. * change test_topk_moe to allow results in arbitrary order * disable sigmoid fusion for moltenvk	2026-01-01 08:58:27 +01:00
triplenom	9e10bd2eaf	llama: handle short reads in direct I/O path (#18504 )	2026-01-01 10:24:43 +08:00
Anri Lombard	4cd162a123	chat: make tool description and parameters optional per OpenAI spec (#18478 ) * chat: make tool description and parameters optional per OpenAI spec Per the OpenAI API specification, both 'description' and 'parameters' fields in tool function definitions are optional. Previously, the parser would throw an exception if these fields were missing. Attempts to fix #17667 * refactor: use value() for cleaner optional field access	2025-12-31 17:21:37 -06:00
Georgi Gerganov	13814eb370	sync : ggml	2025-12-31 18:54:43 +02:00
Georgi Gerganov	54f67b9b66	ggml : bump version to 0.9.5 (ggml/1410)	2025-12-31 18:54:43 +02:00
Anri Lombard	33ded988ba	quantize: prevent input/output file collision (#18451 ) Check if input and output files are the same before quantizing to prevent file corruption when mmap reads from a file being written to. Fixes #12753	2025-12-31 23:29:03 +08:00
Sigbjørn Skjæret	0db8109849	convert : lint fix (#18507 )	2025-12-31 14:28:21 +01:00
Henry147147	9b8329de7a	mtmd : Adding support for Nvidia Music Flamingo Model (#18470 ) * Inital commit, debugging q5_k_s quant * Made hf_to_gguf extend whisper to reduce code duplication * addressed convert_hf_to_gguf pull request issue --------- Co-authored-by: Henry D <henrydorsey147@gmail.com>	2025-12-31 12:13:23 +01:00
gatbontonpc	9a6369bb60	metal : add count_equal op (#18314 ) * add count equal for metal * remove trailing whitespace * updated doc ops table * changed shmem to i32 * added multi tg and templating * removed BLAS support from Metal docs * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * add memset to set dst to 0 * metal : cleanup --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-12-31 10:39:48 +02:00
Johannes Gäßler	ecc343de63	CUDA: fix KQ max calculation (#18487 )	2025-12-31 09:37:00 +01:00
Georgi Gerganov	01ade96e71	metal : remove BF16 x F16 kernels (#18456 )	2025-12-31 09:53:48 +02:00
Aman Gupta	7bcaf815c2	sycl: add newline at the end of CMakeLists.txt (#18503 )	2025-12-31 14:23:44 +08:00

... 14 15 16 17 18 ...

8413 Commits All Branches Search

8413 Commits

All Branches