llama.cpp

Commit Graph

Author	SHA1	Message	Date
Georgi Gerganov	2fbde785bc	kv-cache : optimize KQ mask construction (#18842 ) * kv-cache : optimize KQ mask construction * cont : add explanation + improve * cont : fix	2026-01-17 15:42:42 +02:00
Reese Levine	a89002f07b	ggml webgpu: support for backend sampling (#18880 ) * ggml webgpu: add SOFTPLUS unary operator Implements SOFTPLUS (log(1 + exp(x))) with f16/f32 support. Uses f32 precision for intermediate calculations to prevent f16 overflow. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support * Follow Vulkan backend numerical stability pattern * ggml webgpu: add EXPM1 unary operator Implements EXPM1 (exp(x) - 1) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support * ggml webgpu: add FLOOR unary operator Implements FLOOR (rounds down to nearest integer) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support * ggml webgpu: add CEIL unary operator Implements CEIL (rounds up to nearest integer) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support * ggml webgpu: add ROUND unary operator Implements ROUND (rounds to nearest integer) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support * ggml webgpu: add TRUNC unary operator Implements TRUNC (truncates towards zero) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support * docs : update WebGPU support for unary operators (FLOOR, CEIL, ROUND, TRUNC, EXPM1, SOFTPLUS) * Updates to webgpu get_memory * Add argmax * Add argmax,cumsum,sum,sum_rows * Add necessary CPY/GET_ROWS operators * Support for argsort using multi-pass strategy * Update set_rows for i32 indices, move to pre-wgsl * Port unary operators to pre-wgsl and support FILL * Implement PAD * Add support for top-k * clean up, scope pipeline init mutex * fix newline * Add support for log * Update LOG for better precision, and ops doc --------- Co-authored-by: Abhijit Ramesh <abhijitramesh2k@gmail.com>	2026-01-16 16:12:43 -08:00
Mustafa Cavus	aa4bc90030	Syntax correction for workflows build file	2026-01-16 13:06:43 -08:00
Thore Koritzius	388ce82241	ggml : extend ggml_pool_1d + metal (#16429 ) * chore: resolve conflicts * feat: ggml metal impl * fix: ggml_metal_kargs_pool_1d struct * fix: require contiguous input * chore: test pool_1d * chore: limit pool1d test cases to p0=0 and s0=k0 to conform with asserts * chore: add p0 and s0 to testing * fix: allow padding for cpu and metal * Update ggml/src/ggml-metal/ggml-metal.metal * fix: correct single-threaded loop * ggml : cleanup * tests : add ne[1] != 1 tests * fix: ne[1] handling in np * cont : fixes --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2026-01-16 16:59:56 +02:00
hipudding	6ba6a3c76f	docs : update ops.md for CANN backend (#18654 )	2026-01-16 13:32:17 +01:00
Perry Naseck	0802d4cfb3	ggml-blas: hide warnings from included BLAS headers (#18818 ) * fix compile def openblas, blis for compat libs, nvpl compile def, warn if no blas vendor set * ggml-blas: hide warnings from included BLAS headers	2026-01-16 13:38:25 +02:00
Tarek Dakhran	c945aaaef2	mtmd : Fix ASR for LFM2.5-Audio-1.5B (#18876 )	2026-01-16 11:23:08 +01:00
Xuan-Son Nguyen	c15395f73c	common : implement new jinja template engine (#18462 ) * jinja vm * lexer * add vm types * demo * clean up * parser ok * binary_expression::execute * shadow naming * bin ops works! * fix map object * add string builtins * add more builtins * wip * use mk_val * eval with is_user_input * render gemma tmpl ok * track input string even after transformations * support binded functions * keyword arguments and slicing array * use shared_ptr for values * add mk_stmt * allow print source on exception * fix negate test * testing more templates * mostly works * add filter_statement * allow func to access ctx * add jinja-value.cpp * impl global_from_json * a lot of fixes * more tests * more fix, more tests * more fixes * rm workarounds * demo: type inferrence * add placeholder for tojson * improve function args handling * rm type inference * no more std::regex * trailing spaces * make testing more flexible * make output a bit cleaner * (wip) redirect minja calls * test: add --output * fix crash on macro kwargs * add minimal caps system * add some workarounds * rm caps_apply_workarounds * get rid of preprocessing * more fixes * fix test-chat-template * move test-chat-jinja into test-chat-template * rm test-chat-jinja from cmake * test-chat-template: use common * fix build * fix build (2) * rename vm --> interpreter * improve error reporting * correct lstrip behavior * add tojson * more fixes * disable tests for COMMON_CHAT_FORMAT_GENERIC * make sure tojson output correct order * add object.length * fully functional selectattr / rejectattr * improve error reporting * more builtins added, more fixes * create jinja rendering tests * fix testing.h path * adjust whitespace rules * more fixes * temporary disable test for ibm-granite * r/lstrip behavior matched with hf.js * minimax, glm4.5 ok * add append and pop * kimi-k2 ok * test-chat passed * fix lstrip_block * add more jinja tests * cast to unsigned char * allow dict key to be numeric * nemotron: rm windows newline * tests ok * fix test * rename interpreter --> runtime * fix build * add more checks * bring back generic format support * fix Apertus * [json.exception.out_of_range.403] key 'content' not found * rm generic test * refactor input marking * add docs * fix windows build * clarify error message * improved tests * split/rsplit with maxsplit * non-inverse maxsplit forgot to change after simplifying * implement separators for tojson and fix indent * i like to move it move it * rename null -- > none * token::eof * some nits + comments * add exception classes for lexer and parser * null -> none * rename global -> env * rm minja * update docs * docs: add input marking caveats * imlement missing jinja-tests functions * oops * support trim filter with args, remove bogus to_json reference * numerous argument fixes * updated tests * implement optional strip chars parameter * use new chars parameter * float filter also has default * always leave at least one decimal in float string * jinja : static analysis + header cleanup + minor fixes * add fuzz test * add string.cpp * fix chat_template_kwargs * nits * fix build * revert * unrevert sorry :) * add fuzz func_args, refactor to be safer * fix array.map() * loosen ensure_vals max count condition, add not impl for map(int) * hopefully fix windows * check if empty first * normalize newlines --------- Co-authored-by: Alde Rojas <hello@alde.dev> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2026-01-16 11:22:06 +01:00
Julius Tischbein	aa1dc3770a	Setting mmap and direct_io to false as default in llama-bench.cpp (#18841 )	2026-01-16 09:46:51 +01:00
Raul Torres	4ea2eaac01	CANN: Remove unused `ggml_cann_get_device` function (#18625 )	2026-01-16 16:34:09 +08:00
Chenguang Li	e20fa27a02	CANN: fix an issue where get_env was not fully renamed (#18796 ) * CANN: fix an issue where get_env was not fully renamed * ci: add cann with acl group * ci: define use_acl_graph using GitHub Action * ci: update cann dockerfile with acl graph	2026-01-16 16:24:04 +08:00
hipudding	baa4ba0aec	CANN: support gated linear attn (#18653 ) * CANN: support gated linear attn This change adds support for the GGML_OP_GATED_LINEAR_ATTN operator. The feature was implemented by YushengZhao. Because the previous submission was based on an outdated codebase, this PR was rebased to merge. Co-authored-by: YushengZhao <yusheng.chao@outlook.com> Co-authored-by: hipudding <huafengchun@gmail.com> * CANN: optimize OP gla Optimize gla for high preformance * Remove unused comments --------- Co-authored-by: 赵禹昇 <2501112001@cninfer02.localdomain> Co-authored-by: YushengZhao <yusheng.chao@outlook.com>	2026-01-16 16:18:49 +08:00
Mustafa Cavus	d7dccf887b	kq_mask naming fix	2026-01-15 14:38:53 -08:00
Yamini Nimmagadda	d3649c11cb	Update OPENVINO.md	2026-01-15 11:39:08 -08:00
Yamini Nimmagadda	e9ed5c4cb6	Update OPENVINO.md	2026-01-15 11:39:08 -08:00
Yamini Nimmagadda	f44c60e995	Update OPENVINO.md	2026-01-15 11:39:08 -08:00
Yamini Nimmagadda	63eed0d9f3	Update build.md	2026-01-15 11:39:08 -08:00
Yamini Nimmagadda	61552e4450	Update OPENVINO.md	2026-01-15 11:39:08 -08:00
Yamini Nimmagadda	9ba324726a	Update OPENVINO.md	2026-01-15 11:39:08 -08:00
Yamini Nimmagadda	25e652569b	Update OPENVINO.md	2026-01-15 11:39:08 -08:00
Yamini Nimmagadda	416556a87d	Create OPENVINO.md in llama.cpp backend docs	2026-01-15 11:39:08 -08:00
Mustafa Cavus	599335c633	Update ggml/src/ggml-openvino/ggml-openvino-extra.cpp	2026-01-15 11:39:08 -08:00
Mustafa Cavus	a92eceecd9	Update ggml/src/ggml-openvino/ggml-decoder.cpp	2026-01-15 11:39:08 -08:00
Mustafa Cavus	a81b202f57	requant to f16 for Q6 embed on NPU	2026-01-15 11:39:08 -08:00
Mustafa Cavus	a40a5dfc60	npu perf fix	2026-01-15 11:39:08 -08:00
Mustafa Cavus	981ec6571d	code cleanup	2026-01-15 11:39:08 -08:00
Mustafa Cavus	d2fc15226b	Update ggml/src/ggml-openvino/ggml-decoder.cpp Co-authored-by: Yamini Nimmagadda <yamini.nimmagadda@intel.com>	2026-01-15 11:39:08 -08:00
Mustafa Cavus	5f30eacdb4	Initial stateful graph support	2026-01-15 11:39:08 -08:00
Yu, Zijun	0d6f253e48	Support -ctk f32	2026-01-15 11:39:08 -08:00
Yu, Zijun	f5c71e3cf4	Update build.md	2026-01-15 11:39:08 -08:00
Yu, Zijun	4e451778d3	Use Q8_0_C in token embd, lm_head, and for 5 and 6 bits quant	2026-01-15 11:39:08 -08:00
Yu, Zijun	67c9720e49	Optimize symmetric quant weight extraction: use single zp	2026-01-15 11:39:08 -08:00
Yu, Zijun	c1142ddb7c	NPU always requant to q4_0_128	2026-01-15 11:39:08 -08:00
Yu, Zijun	52a44012c0	Update build.md to include OpenCL	2026-01-15 11:39:08 -08:00
Yu, Zijun	cfc471353d	FIX: use remote tensor from singleton	2026-01-15 11:39:08 -08:00
Yu, Zijun	a356b44477	only use remote tensor for kvcache for GPU	2026-01-15 11:39:08 -08:00
Yu, Zijun	88d1d17eac	only use remote tensor for kvcache	2026-01-15 11:39:08 -08:00
Yu, Zijun	8273a7c2f4	Use ggml_aligned_malloc	2026-01-15 11:39:08 -08:00
Yu, Zijun	d757849741	Put kvcache on GPU	2026-01-15 11:39:08 -08:00
Yu, Zijun	3fdcb6ab72	Add ov_backend_host_buffer; Use cached remote context	2026-01-15 11:39:08 -08:00
Yu, Zijun	72bba828df	Use shared_buffer for GPU NPU; Refactor	2026-01-15 11:39:08 -08:00
Yu, Zijun	22d9c17a6f	backend buffer: allocate on host	2026-01-15 11:39:08 -08:00
Arshath	ae5336386f	Update build.md for Windows	2026-01-15 11:39:08 -08:00
Yu, Zijun	0ef2e5e4d4	Fix decoder can_reuse for llama-bench	2026-01-15 11:39:08 -08:00
Xuejun Zhai	9e3163e846	Remove unused variable nodes	2026-01-15 11:39:08 -08:00
Yu, Zijun	c9234b44cc	NPU fix q4 perf regression	2026-01-15 11:39:08 -08:00
Yu, Zijun	ae01322dbd	NPU fix wrong model output shape	2026-01-15 11:39:08 -08:00
Yu, Zijun	469325c6da	GPU remove Q6_K requantization	2026-01-15 11:39:08 -08:00
Yu, Zijun	28da9a9adc	Reuse cached decoder	2026-01-15 11:39:08 -08:00
Xuejun Zhai	91a1b20c82	Fix error for decoder cache	2026-01-15 11:39:08 -08:00

... 5 6 7 8 9 ...

8297 Commits All Branches Search

8297 Commits

All Branches