llama.cpp

Commit Graph

Author	SHA1	Message	Date
Xuan Son Nguyen	e858b7a0a3	add minimal caps system	2026-01-02 16:28:04 +01:00
Xuan Son Nguyen	0f9f986ace	test: add --output	2026-01-02 11:33:42 +01:00
Xuan Son Nguyen	a66e4a4f5d	make output a bit cleaner	2026-01-01 23:07:45 +01:00
Xuan Son Nguyen	b23b5e3c01	make testing more flexible	2026-01-01 23:02:30 +01:00
Xuan Son Nguyen	d34efd9626	rm type inference	2025-12-31 11:43:53 +01:00
Xuan Son Nguyen	cbb37dd4cd	improve function args handling	2025-12-31 11:29:40 +01:00
Xuan Son Nguyen	1b213ae5e7	add placeholder for tojson	2025-12-30 21:52:47 +01:00
Xuan Son Nguyen	4479c382ce	demo: type inferrence	2025-12-30 17:26:23 +01:00
Xuan Son Nguyen	9c0fa6f810	rm workarounds	2025-12-30 16:07:23 +01:00
Xuan Son Nguyen	9e9a70f72f	more fixes	2025-12-29 15:07:18 +01:00
Xuan Son Nguyen	026730e8e3	more fix, more tests	2025-12-29 12:53:31 +01:00
Xuan Son Nguyen	1cf25734a9	more tests	2025-12-29 10:53:32 +01:00
Xuan Son Nguyen	2a31c9a30c	a lot of fixes	2025-12-29 00:38:29 +01:00
Xuan Son Nguyen	1784a57e7b	impl global_from_json	2025-12-28 23:15:48 +01:00
Xuan Son Nguyen	adad34f64d	add filter_statement	2025-12-28 22:02:22 +01:00
Xuan Son Nguyen	9a8a45ff3b	mostly works	2025-12-28 21:32:55 +01:00
Xuan Son Nguyen	45df0c91e7	testing more templates	2025-12-28 19:50:09 +01:00
Xuan Son Nguyen	db09a7468d	fix negate test	2025-12-28 19:07:01 +01:00
Xuan Son Nguyen	acb0effa25	allow print source on exception	2025-12-28 18:45:41 +01:00
Xuan Son Nguyen	7f17608ea4	use shared_ptr for values	2025-12-28 17:46:25 +01:00
Xuan Son Nguyen	4331e9c8e9	keyword arguments and slicing array	2025-12-28 17:23:29 +01:00
Xuan Son Nguyen	45c194622e	support binded functions	2025-12-28 15:33:14 +01:00
Xuan Son Nguyen	4ca114b095	track input string even after transformations	2025-12-28 12:48:35 +01:00
Xuan Son Nguyen	81310d29c1	render gemma tmpl ok	2025-12-28 12:04:23 +01:00
Xuan Son Nguyen	10835f2720	eval with is_user_input	2025-12-27 23:25:20 +01:00
Xuan Son Nguyen	da7bbe5813	wip	2025-12-27 22:25:19 +01:00
Xuan Son Nguyen	15b3dbab05	add string builtins	2025-12-27 21:52:50 +01:00
Xuan Son Nguyen	d8ef00e610	bin ops works!	2025-12-27 20:16:46 +01:00
Xuan Son Nguyen	8cea1ed6b0	parser ok	2025-12-27 12:55:01 +01:00
Xuan Son Nguyen	15b7c50e95	lexer	2025-12-25 21:08:51 +01:00
Jeff Bolz	e3b35ddf1c	vulkan: Extend rope fusions to allow mrope (#18264 ) Extend the test-backend-ops tests as well.	2025-12-22 11:03:13 -06:00
Johannes Gäßler	147a521636	tool/ex/tests: consistently free ctx, then model (#18168 )	2025-12-22 11:00:37 +01:00
Jeff Bolz	fd05c51cec	vulkan: fix im2col overflowing maxworkgroupcount (#18180 )	2025-12-21 10:32:58 +01:00
Jeff Bolz	b365c3ff01	vulkan/cuda: fix topk_moe with exp_probs_b (#18071 ) I updated test_topk_moe to more closely match llm_graph_context::build_moe_ffn and added coverage for exp_probs_b and some other missing combinations. This exposed a bug in both CUDA and Vulkan backends where they were assuming the input to argsort and the input to get_rows are the same. I'd like to optimize this graph in another change, but for now just get it functional. CUDA also had a bug where it got n_experts from the wrong place, leading to GGML_ASSERT failures in some of the new tests.	2025-12-21 10:27:34 +01:00
Jeff Bolz	52ab19df63	tests: Avoid floating point precision false positives in SUM (#17471 ) * tests: Avoid floating point precision false positives in SUM * also apply to test_mean	2025-12-20 13:46:46 -06:00
Jeff Bolz	5182dd64cd	test-backend-ops: improve msvc build time (#18209 )	2025-12-20 13:45:45 -06:00
Xuan-Son Nguyen	9e39a1e6a9	server: support load model on startup, support preset-only options (#18206 ) * server: support autoload model, support preset-only options * add docs * load-on-startup * fix * Update common/arg.cpp Co-authored-by: Pascal <admin@serveurperso.com> --------- Co-authored-by: Pascal <admin@serveurperso.com>	2025-12-20 09:25:27 +01:00
Pascal	14931a826e	arg: fix order to use short form before long form (#18196 ) * arg: fix order to use short form before long form * arg: update doc * arg: update test-arg-parser * arg: address review feedback from ngxson simplified to check first.length() <= last.length() only fixed: --sampler-seq, --rerank, --draft ordering note: middle positions in 3+ arg sets are not verified * arg: update doc	2025-12-19 18:01:56 +01:00
Xuan-Son Nguyen	8ea958d4d9	model : add ASR support for LFM2-Audio-1.5B (conformer) (#18106 ) * ASR with LFM2-Audio-1.5B * Set rope_theta * Fix comment * Remove rope_theta setting * Address PR feedback * rename functions to conformer * remove some redundant ggml_cont * fix missing tensor * add prefix "a." for conv tensors * remove redundant reshape * clean up * add test model --------- Co-authored-by: Tarek Dakhran <tarek@liquid.ai>	2025-12-19 00:18:01 +01:00
Aldehir Rojas	c05aa69f32	common : add nemotron 3 parsing (#18077 ) * common : expose json-schema functionality to extract type info * common : fix peg parser negation during needs_more_input * common : add some defensive measures in constructed peg parser * common : add nemotron nano 3 support * common : add nemotron nano 3 tests * remove debug line	2025-12-16 04:05:23 -06:00
ssweens	4529c660c8	kv-cache: Fix state restore fragmented cache (#17982 ) * kv-cache : fix state restore with fragmented cache (#17527) Change find_slot to allow non-contiguous allocation during state restore. Fixes 'failed to find available cells in kv cache' error when restoring state to fragmented cache. * tests : update logic * cleanup: tightened state_read_meta sig, added is_contiguous case * fix: state_read_meta arg reorder loose ends --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-12-15 19:28:35 +02:00
Xuan-Son Nguyen	4d5ae24c0a	arg: fix common_params_parse not accepting negated arg (#17991 )	2025-12-13 12:53:37 +01:00
Jeff Bolz	303f8615e9	vulkan: Multi-pass softmax for large number of cols (#17892 ) When the number of cols is large, split each row across multiple workgroups. There are three phases that communicate partial results through temp buffers: (1) compute max partials (2) take max of partials, compute sum(exp(x-max)) partials (3) sum partials, compute scaled result	2025-12-13 10:04:29 +01:00
Jeff Bolz	07a10c1090	vulkan: Allow non-pow2 n_experts in topk_moe (#17872 )	2025-12-13 08:40:04 +01:00
Xuan-Son Nguyen	380b4c984e	common: support negated args (#17919 ) * args: support negated args * update docs * fix typo * add more neg options * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * rm duplicated arg * fix LLAMA_ARG_NO_HOST * add test --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-12-12 23:58:53 +01:00
Piotr Wilkin (ilintar)	53ecd4fdb9	SOLVE_TRI extension to more dimensions (#17793 ) * Extended TRI * Fix whitespace * chore: update webui build output * Just use cuBLAS for everything... * Merge both versions * Remove incorrect imports causing failures for CI * Still failing... remove all direct cublas imports and rely on common imports from "common.cuh" * Defines for hipBlas * Aaaand MUSA defines... * I hate this job... * Stupid typo... * Update ggml/src/ggml-cuda/solve_tri.cu Co-authored-by: Johannes Gäßler <johannesg@5d6.de> --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2025-12-11 17:20:43 +01:00
Max Krasnyansky	e1f4921980	Fix race conditions in threadpool when dealing with dynamic/frequent n_threads changes (#17748 ) * tests: update barrier test to check for race condition in active threads * cpu: combine n_graph and n_threads into a single atomic update * tests: add multi-graph test for test_barrier	2025-12-10 12:32:23 -08:00
Georgi Gerganov	4dff236a52	ggml : remove GGML_KQ_MASK_PAD constant (#17910 ) * ggml : remove GGML_KQ_MASK_PAD constant * cont : remove comment	2025-12-10 20:53:16 +02:00
Xuan-Son Nguyen	6c2131773c	cli: new CLI experience (#17824 ) * wip * wip * fix logging, add display info * handle commands * add args * wip * move old cli to llama-completion * rm deprecation notice * move server to a shared library * move ci to llama-completion * add loading animation * add --show-timings arg * add /read command, improve LOG_ERR * add args for speculative decoding, enable show timings by default * add arg --image and --audio * fix windows build * support reasoning_content * fix llama2c workflow * color default is auto * fix merge conflicts * properly fix color problem Co-authored-by: bandoti <bandoti@users.noreply.github.com> * better loading spinner * make sure to clean color on force-exit * also clear input files on "/clear" * simplify common_log_flush * add warning in mtmd-cli * implement console writter * fix data race * add attribute * fix llama-completion and mtmd-cli * add some notes about console::log * fix compilation --------- Co-authored-by: bandoti <bandoti@users.noreply.github.com>	2025-12-10 15:28:59 +01:00
Aldehir Rojas	2fbe3b7bb7	common : add parser for ministral/mistral large 3/devstral 2 (#17713 )	2025-12-09 17:31:04 -06:00

1 2 3 4 5 ...

626 Commits