llama.cpp

Commit Graph

Author	SHA1	Message	Date
Georgi Gerganov	6179578988	batch : require non-coupled batch with sequential split_equal ggml-ci	2025-06-25 17:20:46 +03:00
Georgi Gerganov	5eb1a88dc0	batch : optional requirement for sequential sequence ids ggml-ci	2025-06-25 17:02:38 +03:00
Georgi Gerganov	6663128448	kv-cache : rework kv_idxs, support seq_cp ggml-ci	2025-06-25 14:48:47 +03:00
Georgi Gerganov	0bb1da5854	kv-cache : simplify set_rows logic ggml-ci	2025-06-24 23:24:00 +03:00
Georgi Gerganov	165d822044	graph : support iSWA virtual sequences ggml-ci	2025-06-24 20:35:16 +03:00
Georgi Gerganov	1b74b9d73b	ggml : extend support for n_seq for soft_max and fattn ggml-ci	2025-06-24 20:23:56 +03:00
Georgi Gerganov	8c68219835	kv-cache : fix non-FA path with virutal sequences ggml-ci	2025-06-24 20:01:10 +03:00
Georgi Gerganov	7c6487b22f	metal : extend ggml_soft_max_ext() to support n_seq dim	2025-06-24 20:00:40 +03:00
Georgi Gerganov	401c13e3c3	cont : fix build ggml-ci	2025-06-24 15:59:47 +03:00
Georgi Gerganov	132143938f	tools : tmp adjustments (TMP) ggml-ci	2025-06-24 15:21:35 +03:00
Georgi Gerganov	52b9007176	llama : add "virtual sequences" ggml-ci	2025-06-24 15:02:52 +03:00
Georgi Gerganov	36f8e20d08	kv-cache : utilize ggml_set_rows broadcast ggml-ci	2025-06-23 13:22:51 +03:00
Georgi Gerganov	332f073589	cont : support non-continuous slots ggml-ci	2025-06-23 13:22:47 +03:00
Georgi Gerganov	39d0b1e8df	cont : kv-cells cp/set for non-cont slots ggml-ci	2025-06-23 13:21:37 +03:00
Georgi Gerganov	f875d6cb72	cont : migrate to using set of indices instead of slot head ggml-ci	2025-06-23 13:21:36 +03:00
Georgi Gerganov	db2bb378b1	cont : gate the ggml_set_rows usage with env var ggml-ci	2025-06-23 13:21:36 +03:00
Georgi Gerganov	79dac3c861	kv-cache : use ggml_set_rows ggml-ci	2025-06-23 13:21:36 +03:00
Radoslav Gerganov	1f647b5992	ggml : fix supports_op	2025-06-23 13:21:36 +03:00
Radoslav Gerganov	eba97574da	ggml : simplify forward_dup_f32	2025-06-23 13:21:36 +03:00
Georgi Gerganov	c0cfc2f78b	metal : add ggml_set_rows implementation ggml-ci	2025-06-23 13:21:36 +03:00
Georgi Gerganov	828e5d2fcd	tests : add ggml_set_rows	2025-06-23 13:21:35 +03:00
Georgi Gerganov	e73690a69d	ggml : ggml_set_rows update comment + better index name	2025-06-23 13:21:35 +03:00
Georgi Gerganov	e89709721b	ggml : support GGML_TYPE_F32 ".from_float" trait	2025-06-23 13:21:35 +03:00
Georgi Gerganov	630c84a2bd	ggml : ggml_set_rows support quantized dst ggml-ci	2025-06-23 13:21:35 +03:00
Georgi Gerganov	df71c803b4	ggml : ggml_set_rows support broadcast	2025-06-23 13:21:35 +03:00
Georgi Gerganov	313a444b22	ggml : add ggml_is_contiguous_rows	2025-06-23 13:21:35 +03:00
Georgi Gerganov	695b6b7025	ggml : add repeat impl for i64	2025-06-23 13:21:34 +03:00
Radoslav Gerganov	f2cd962fe2	use I64 for indices	2025-06-23 13:21:34 +03:00
Radoslav Gerganov	c1a581a10b	ggml : add ggml_set_rows Add ggml_set_rows(a, b, c) which copies rows from 'b' into 'a' using indices from 'c'. ref: #8366	2025-06-23 13:21:32 +03:00
Georgi Gerganov	7b50d589a8	kv-cells : fix tracking of seq_pos (#14339 ) * kv-cells : fix tracking of seq_pos during cache reuse ggml-ci * cont : improve error message ggml-ci * cont : add more comments	2025-06-23 12:27:35 +03:00
Jeff Bolz	3a9457df96	vulkan: update windows SDK in CI (#14334 )	2025-06-23 10:19:24 +02:00
Ed Addario	fa4a9f2a1c	quantize : handle user-defined pruning of whole layers (blocks) (#13037 )	2025-06-22 23:16:26 +02:00
Sigbjørn Skjæret	238005c2dc	gguf-py : fix SpecialVocab parsing when post_processor is null (#14330 )	2025-06-22 19:46:17 +02:00
Ruikai Peng	66aba7aca9	run : avoid double tokenization (#14327 ) * run : avoid double tokenization by adopting common_tokenize heuristic * build : fix windows gcc and clang warnings * lint : fixed trailing whitepace * run : fix is_first flag	2025-06-23 01:28:06 +08:00
Georgi Gerganov	f1f5e82df6	examples : fix is_first logic for tokenization (#14329 ) ggml-ci	2025-06-22 20:10:07 +03:00
uvos	af3373f1ad	HIP: enable vec fattn on RDNA4 (#14323 )	2025-06-22 16:51:23 +02:00
yuiseki	5d5c066de8	mtmd : fix Pixtral OOM with large images by capping image_size to 1024 (#14326 ) Mistral Small 2506 models using Pixtral vision encoder were running out of GPU memory when processing images larger than 1024x1024 pixels due to exponential memory growth from unlimited image size. This fix applies the same 1024x1024 limit used by Qwen2VL models to prevent OOM issues while maintaining compatibility with existing models.	2025-06-22 14:44:57 +02:00
Sigbjørn Skjæret	40bfa04c95	common : use std::string_view now that we target c++17 (#14319 )	2025-06-22 08:37:43 +03:00
Aman Gupta	aa064b2eb7	CUDA: add mean operation (#14313 ) * CUDA: add mean operation * add back sum_rows_f32_cuda * Review: early exit if col!=0	2025-06-22 12:39:54 +08:00
Sigbjørn Skjæret	aa0ef5c578	gguf-py : fix Qwen3-Embedding eos token (#14314 )	2025-06-21 18:12:05 +02:00
Markus Tavenrath	bb16041cae	Add support for VK_EXT_debug_utils to add labels to Vulkan objects. (#13792 ) * Add support for VK_EXT_debug_utils to add labels to Vulkan objects. In step 1 compute pipelines are getting labeled. * remove #ifdef for debug utils and add queue marker.	2025-06-21 08:17:12 +02:00
Sigbjørn Skjæret	58cba76a9a	gguf-py : fix TemplateProcessing pair when bos/eos is missing (#14312 )	2025-06-21 07:33:21 +02:00
Georgi Gerganov	67ae5312e2	metal : fix thread-safety (#14300 ) ggml-ci	2025-06-21 08:04:18 +03:00
Georgi Gerganov	692e3cdd0a	memory : rename interface to llama_memory_context_i (#14296 ) * memory : rename interface to llama_memory_context_i ggml-ci * cont : fix comments * cont : use "mctx" for referencing a memory context ggml-ci	2025-06-21 08:03:46 +03:00
Daniel Han	b23fa0b3f4	convert : fix Llama 4 conversion (#14311 )	2025-06-21 06:32:01 +02:00
Georgi Gerganov	06cbedfca1	sync : ggml ggml-ci	2025-06-20 21:02:47 +03:00
Acly	b7147673f2	Add `ggml_roll` (ggml/1274) * ggml : add ggml_roll * use set/get_op_params & std::min	2025-06-20 21:02:47 +03:00
David Chiu	d860dd99a4	docs : fix the link to llama.h (#14293 )	2025-06-20 19:43:35 +02:00
Aman Gupta	c959f462a0	CUDA: add conv_2d_transpose (#14287 ) * CUDA: add conv_2d_transpose * remove direct include of cuda_fp16 * Review: add brackets for readability, remove ggml_set_param and add asserts	2025-06-20 22:48:24 +08:00
Sigbjørn Skjæret	22015b2092	lint : remove trailing whitepace (#14304 )	2025-06-20 16:37:44 +02:00

1 2 3 4 5 ...

5771 Commits All Branches Search

5771 Commits

All Branches