llama.cpp

Commit Graph

Author	SHA1	Message	Date
Yee Man Chan	67bee56013	add Kimi-K2 specific tokens to be recognized as EOG	2026-01-06 21:15:12 +08:00
Yee Man Chan	e3542ff8a2	fixed some comments	2026-01-06 11:35:25 +08:00
Yee Man Chan	cfed14e31b	naive chunking form implemented	2026-01-06 11:23:53 +08:00
Yee Man Chan	aba181ebad	removed LOG_INFO	2026-01-05 19:21:06 +08:00
Yee Man Chan	66c0c5d8d4	Kimi Linear backend agnostic	2026-01-05 16:35:19 +08:00
Yee Man Chan	a4020d867f	don't quantize conv1d of Kimi Linear	2026-01-03 08:27:29 +08:00
Yee Man Chan	8bd617eb1c	set n_embd_head_k/v to ensure kv cache works	2026-01-03 08:26:41 +08:00
Yee Man Chan	f85e5c73b9	Move KIMI_LINEAR to llm_arch_is_hybrid to enable KV cache	2026-01-02 21:20:34 +08:00
Yee Man Chan	f67a42d572	reduce OP count by 1 due to removal of kda_scan	2025-12-19 07:37:33 +08:00
Yee Man Chan	776294c04e	removed all traces of kda_scan	2025-12-19 07:36:06 +08:00
Yee Man Chan	f9a11d7758	rewrite get_vocab for KimiLinear. Removed all kda_scan code	2025-12-18 20:46:10 +08:00
Yee Man Chan	ae9771d1dc	removed unnecessary internal methods called by the old set_vocab of KimiLinear	2025-12-18 08:14:15 +08:00
Yee Man Chan	ef5bc30544	use DeepseekV2 tokenizer	2025-12-14 17:43:30 +08:00
Yee Man Chan	a0269af292	removed all hard code	2025-12-06 11:51:16 +08:00
Yee Man Chan	9f1265fec1	removed some hard coded code	2025-12-05 19:51:02 +08:00
Yee Man Chan	772ca88070	read MoE params	2025-12-02 20:16:24 +08:00
Yee Man Chan	83d328d0d3	remove type mismatch warning	2025-12-02 14:09:02 +08:00
Yee Man Chan	139548d070	remove "const int64_t n_seq_tokens = q->ne[2];" to get rid of unused variable warning	2025-12-02 12:11:15 +08:00
Yee Man Chan	e308026f64	kimi linear src/llama	2025-12-02 12:02:35 +08:00
Yee Man Chan	d73d3e51a5	Kimi Linear ggml.c	2025-12-02 11:27:57 +08:00
Yee Man Chan	bf42bc0606	Kimi Linear ggml-cuda	2025-12-02 11:24:37 +08:00
Yee Man Chan	26a6553155	kimi linear ggml-cpu	2025-12-02 11:20:46 +08:00
Yee Man Chan	6167f39e08	Kimi Linear ggml.h	2025-12-02 11:14:34 +08:00
Yee Man Chan	57cca52779	kimi linear constants.py tensor_mapping.py	2025-12-02 10:40:44 +08:00
Yee Man Chan	84f822c5a5	kimi linear convert_hf_to_gguf	2025-12-02 08:51:09 +08:00
Yee Man Chan	27baad43d5	kimi linear model implementation	2025-12-02 08:35:14 +08:00
Xuan-Son Nguyen	7733409734	common: improve verbosity level definitions (#17630 ) * common: improve verbosity level definitions * string_format * update autogen docs	2025-12-01 14:38:13 +01:00
Xuan-Son Nguyen	cd3c118908	model: support Ministral3 (#17644 ) * conversion script * support ministral 3 * maybe this is better? * add TODO for rope_yarn_log_mul * better ppl (tested on 14B-Instruct) * Add Ministral3 support to Mistral format * improve arch handling * add sizes * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * nits --------- Co-authored-by: Julien Denize <julien.denize@mistral.ai> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-12-01 12:26:52 +01:00
Georgi Gerganov	649495c9d9	metal : add FA head size 48 (#17619 )	2025-12-01 12:49:53 +02:00
Georgi Gerganov	90c72a614a	ggml : extend the GGML_SCHED_NO_REALLOC debug logic of the scheduler (#17617 )	2025-12-01 12:49:33 +02:00
Aman Gupta	6eea666912	llama-graph: avoid expand_forward for fusion (#17633 )	2025-12-01 11:12:48 +02:00
Xuan-Son Nguyen	ff90508d68	contributing: update guidelines for AI-generated code (#17625 ) * contributing: update guidelines for AI-generated code * revise	2025-11-30 22:51:34 +01:00
Adrien Gallouët	0a4aeb927d	cmake : add option to build and link LibreSSL (#17552 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2025-11-30 22:14:32 +01:00
Tarek Dakhran	2ba719519d	model: LFM2-VL fixes (#17577 ) * Adjust to pytorch * Add antialiasing upscale * Increase number of patches to 1024 * Handle default marker insertion for LFM2 * Switch to flag * Reformat * Cuda implementation of antialias kernel * Change placement in ops.cpp * consistent float literals * Pad only for LFM2 * Address PR feedback * Rollback default marker placement changes * Fallback to CPU implementation for antialias implementation of upscale	2025-11-30 21:57:31 +01:00
Xuan-Son Nguyen	7f8ef50cce	clip: fix nb calculation for qwen3-vl (#17594 )	2025-11-30 15:33:55 +01:00
Xuan-Son Nguyen	3c136b21a3	cli: add migration warning (#17620 )	2025-11-30 15:32:43 +01:00
Adrien Gallouët	beb1f0c503	common : throttle download progress output to reduce IO flush (#17427 ) This change limits progress updates to approximately every 0.1% of the file size to minimize stdio overhead. Also fixes compiler warnings regarding __func__ in lambdas. Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2025-11-30 14:22:44 +02:00
Aaron Teo	def5404f26	common: add LLAMA_LOG_FILE env var (#17609 ) Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-11-30 12:12:32 +01:00
Gilad S.	fa0465954f	ggml: fix: macOS build with `-DGGML_BACKEND_DL=ON` (#17581 )	2025-11-30 10:00:59 +08:00
ddh0	5a6241feb0	common: update env var name (#17588 )	2025-11-30 09:59:25 +08:00
Aman Gupta	c7af376c29	CUDA: add stream-based concurrency (#16991 ) * CUDA: add stream-based concurrency * HIP: fix hipStreamWaitEvent define and nodiscard warnings * ggml-cuda: fix fusion inside stream * ggml-cuda: fix bug w.r.t first stream launch * ggml-cuda: format * ggml-cuda: improve assert message * ggml-cuda: use lambda instead of duplicating code * ggml-cuda: add some more comments * ggml-cuda: add more detailed comments about concurrency * ggml-cuda: rename + remove unused var * ggml-cuda: fix condition for stream launch * ggml-cuda: address review comments, add destructor * common.cuh: add is_valid for concurrent events * common.cuh: make comment better * update comment Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * update comment Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * common.cuh: fix lower_bound condition + remove join_node data from write_ranges * ggml-cuda: fix overlap condition + shadowing parameter --------- Co-authored-by: Carl Philipp Klemm <carl@uvos.xyz> Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2025-11-30 08:17:55 +08:00
Mahekk Shaikh	00425e2ed1	cuda : add error checking for cudaMemcpyAsync in argsort (#17599 ) * cuda : add error checking for cudaMemcpyAsync in argsort (#12836) * fix indentation	2025-11-30 08:16:28 +08:00
Acly	385c3da5e6	vulkan : fix FA mask load with bounds check (coopmat2) (#17606 )	2025-11-30 01:03:21 +01:00
Xuan-Son Nguyen	ab49f094d2	server: move server-context to its own cpp\|h (#17595 ) * git mv * add server-context.h * add server-context.h * clean up headers * cont : cleanup * also expose server_response_reader (to be used by CLI) * fix windows build * decouple server_routes and server_http --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-11-29 22:04:44 +01:00
Haiyue Wang	8c32d9d96d	server: explicitly set the function name in lambda (#17538 ) As [1] explained, the real debug message will be like: "res operator(): operator() : queue result stop" Set the name explicitly, the message is easy for debugging: "res operator(): recv : queue result stop" The left "operator()" is generated by 'RES_DBG() ... __func__' [1]: https://clang.llvm.org/extra/clang-tidy/checks/bugprone/lambda-function-name.html Signed-off-by: Haiyue Wang <haiyuewa@163.com>	2025-11-29 18:43:29 +01:00
Igor Smirnov	0874693b44	common : fix json schema with '\' in literals (#17307 ) * Fix json schema with '\' in literals * Add "literal string with escapes" test	2025-11-29 17:06:32 +01:00
Neo Zhang	7d2add51d8	sycl : support to malloc memory on device more than 4GB, update the doc and script (#17566 ) Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>	2025-11-29 14:59:44 +02:00
ixgbe	f698a79c63	ggml: replace hwcap with riscv_hwprobe for RVV detection (#17567 ) Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>	2025-11-29 14:56:31 +02:00
Ruben Ortlam	47a268ea50	Vulkan: MMVQ Integer Dot K-Quant and MUL_MAT_ID support (#16900 ) * vulkan: split mul_mmq_funcs for mul_mat_vecq use * add mxfp4 mmvq * add q2_k mmvq * add q3_k mmvq * add q4_k and q5_k mmvq * add q6_k mmvq * handle 4x4 quants per mmvq thread * enable MUL_MAT_ID mmvq support * enable subgroup optimizations for mul_mat_vec_id shaders * device tuning * request prealloc_y sync after quantization * fix indentation * fix llvmpipe test failures * fix mul_mat_id mmvq condition * fix unused variable warning	2025-11-29 09:37:22 +01:00
Jeff Bolz	59d8d4e963	vulkan: improve topk perf for large k, fix overflow in unit tests (#17582 )	2025-11-29 08:39:57 +01:00

1 2 3 4 5 ...

7243 Commits All Branches Search

7243 Commits

All Branches