llama.cpp

Commit Graph

Author	SHA1	Message	Date
Yuri Khrustalev	c053e18a66	chat: Add LFM2 tool handling (#16763 ) * Add LFM2 tool handling * fmt * Apply suggestion from @ykhrustalev	2025-10-27 23:54:01 +01:00
Xuan-Son Nguyen	e1ab084803	mtmd : fix idefics3 preprocessing (#16806 ) * mtmd : fix idefics3 preprocessing * disable granite test * fix test for granite	2025-10-27 23:12:16 +01:00
bssrdf	6d12288037	WIP: fixed a bug in cpy transpos index computation	2025-10-27 17:32:03 -04:00
Diego Devesa	5a4ff43e7d	llama : disable pipeline parallelism if compute buffer allocation fails (#16748 )	2025-10-27 21:51:28 +01:00
Acly	10640e31aa	ggml : fix interpolate with align-corners and ne=1 (#16700 ) * ggml : fix interpolate with align-corners and ne=1 * avoid division by zero if one of the spatial dimensions is 1 * cpu, cuda, opencl returned correct result anyway due to clamp * vulkan didn't clamp for align-corners so results were broken * fix clang warning	2025-10-27 21:50:22 +01:00
Johannes Gäßler	80d28f104c	HIP: fix AMDGPU_TARGETS, update documentation (#16803 )	2025-10-27 21:39:49 +01:00
bssrdf	a3784e17ad	WIP: debugging cpy transpose	2025-10-27 15:09:03 -04:00
bssrdf	cc327f5224	added a specialization for cuda copy op when tensor is transposed	2025-10-27 11:23:27 -04:00
Xuan-Son Nguyen	c55d53acec	model : add LightOnOCR-1B model (#16764 ) * model : add LightOnOCR-1B model * add test	2025-10-27 16:02:58 +01:00
bssrdf	30990788e8	WIP	2025-10-27 08:29:20 -04:00
Johannes Gäßler	945501f5ea	llama: fix leaked buffers for mmap + split files (#16765 )	2025-10-27 09:17:31 +01:00
Aman Gupta	75cbdd3fce	test-backend-ops: print failed tests at the end (#16785 )	2025-10-27 09:25:10 +08:00
tamarPal	2b9bd9bf4e	sycl: add ROLL operation support (#16665 ) * sycl: add ROLL operation support - Implement ggml_sycl_roll function for F32 tensors - Add multi-axis roll operation with SYCL kernel - Support all 4 tensor dimensions with proper shift normalization - Add roll.cpp and roll.hpp to SYCL backend - Update backend dispatch and supports_op for GGML_OP_ROLL - Tests: 17662/17662 pass with identical CPU reference results * fix: remove trailing whitespace from roll.cpp - Fix EditorConfig violations in ggml/src/ggml-sycl/roll.cpp - Remove trailing spaces from lines 6, 11, 28, 47, 58, 60 * ci: retrigger * sycl: remove wait() calls from ROLL operation * fix: editorconfig — LF endings + final newline for roll.hpp --------- Co-authored-by: tamarPal <tamarPal@example.com>	2025-10-27 09:20:24 +08:00
shani-f	59fc1ec8e8	sycl: add REPEAT_BACK operation support (#16734 ) * SYCL repeat_back v1 — add core op + switch case * Implement repeat_back SYCL operation and minor fixes * Update ggml/src/ggml-sycl/repeat_back.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update ggml/src/ggml-sycl/repeat_back.hpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update ggml/src/ggml-sycl/ggml-sycl.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-10-27 09:19:50 +08:00
Aman Gupta	75d33b9302	CUDA: support for weight clamp in top-k norm (#16702 )	2025-10-27 09:06:16 +08:00
Acly	3470a5c891	ggml-alloc : make gallocr prefer chunks that allow memory reuse (#16788 )	2025-10-26 23:19:03 +01:00
Sigbjørn Skjæret	bd562fe4f7	cuda : use fast copy when src and dst are of different type and contiguous (#16789 ) * use fast copy when src and dst are contiguous and same shape * use int64_t ne and ignore shape	2025-10-26 21:31:41 +01:00
leejet	bbac6a26b2	ggml: fix cuda kernel launch configuration for k_compute_batched_ptrs to support large batch (#16744 ) * fix k_compute_batched_ptrs * add backend ops test * Update ggml/src/ggml-cuda/ggml-cuda.cu Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * reduce the batch size --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2025-10-26 19:13:31 +01:00
Sigbjørn Skjæret	73a48c9790	convert : enable expert group selection for all models with it (#16691 )	2025-10-26 17:21:23 +01:00
Sigbjørn Skjæret	f696428ce8	graph : add clamping to ffn_moe_weights_sum to avoid div-by-zero (#16655 ) * add missing norm topk bias * use clamping instead, update number and add comment	2025-10-26 17:20:32 +01:00
Sigbjørn Skjæret	7cce4f8158	model : set res->t_embd in SmallThinker models (#16782 )	2025-10-26 16:08:52 +01:00
amirai21	8d8862829c	docs : add Jamba to Text-only models list (#16778 )	2025-10-26 13:01:20 +01:00
Aman Gupta	f77c13b91f	CUDA: General GEMV fusion (#16715 )	2025-10-26 19:28:04 +08:00
Gilad S.	3cfa9c3f12	vulkan: deduplicate Microsoft Direct3D12 devices (#16689 ) * fix: deduplicate and deprioritize Microsoft Direct3D12 vulkan devices from the `vulkan-dozen` driver * style: indent * fix: decrease priority * fix: switch to `\|\|`	2025-10-26 05:37:38 +01:00
bssrdf	c68fe36ae2	WIP: cleanup; enhanced test case	2025-10-25 21:57:39 -04:00
bssrdf	475f9879c5	WIP: fixed another bug	2025-10-25 20:24:14 -04:00
bssrdf	396f55831c	WIP: bug fix	2025-10-25 18:14:12 -04:00
Galunid	5d195f17bc	convert : handle mmproj filename/path properly (#16760 ) * convert: handle mmproj model output filename properly * remove redundant commits * Add model_type to gguf utility * Use mmproj- prefix instead of suffix * Apply CISC suggestion Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-10-25 20:41:36 +02:00
bssrdf	610e41ae2d	still debugging	2025-10-25 11:10:39 -04:00
Shunta Saito	226f295f4d	model : set res->t_embd in PLaMo2 models (#16766 )	2025-10-25 12:26:27 +02:00
Giuseppe Scrivano	f90b4a8efe	vulkan: delete dead code (#16732 ) ggml_vk_create_buffer_temp is not used anywhere, and it is the only caller for ggml_vk_pool_malloc. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2025-10-25 10:59:54 +02:00
Jeff Bolz	8423d01931	vulkan: Optimize SSM_SCAN (#16645 )	2025-10-25 07:04:12 +02:00
bssrdf	c45df12ee7	this case is broken; to be debugged	2025-10-24 22:40:34 -04:00
bssrdf	980ddc1e87	properly use __CUDA_ARCH__ to protect the tensor path	2025-10-24 21:56:58 -04:00
compilade	5cca2542ac	convert : avoid dequantizing mxfp4 for GPT-OSS (#16756 )	2025-10-24 20:52:00 -04:00
bssrdf	24b553204b	WIP: fixed another bug	2025-10-24 16:53:40 -04:00
leejet	55945d2ef5	ggml: fix CUDA grid launch condition for large block_nums.y in binbcast (#16742 ) * Fix CUDA grid launch condition for large block_nums.y * add backend ops test * reduce test repetitions	2025-10-24 21:39:37 +02:00
bssrdf	6c90c20cb1	WIP: bug fix	2025-10-24 15:33:57 -04:00
bssrdf	be25be8ed3	WIP: debugging tensor core kernel	2025-10-24 14:24:26 -04:00
bssrdf	80a996cfc0	WIP: tensore code compiled ok	2025-10-24 11:41:11 -04:00
Aman Gupta	0bcb40b48c	CUDA: use CUB for arbitary size argsort (#16754 )	2025-10-24 20:46:19 +08:00
Florian Badie	69e9ff0103	webui: support q URL parameter (#16728 ) * webui: support q URL parameter Fixes #16722 I’ve checked that it works with Firefox’s AI tools * webui: apply suggestions from code review Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * chore: update webui static build --------- Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>	2025-10-24 14:10:29 +02:00
Daniel Bevenius	5a91109a5d	model-conversion : add trust_remote_code for orig model run [no ci] (#16751 ) This commit add the trust_remote_code=True argument when loading models using AutoConfig, AutoTokenizer, and AutoModelForCausalLM for the run original model script. The motivation for this is that some models require custom code to be loaded properly, and setting trust_remote_code=True avoids a prompt asking for user confirmation: ```console (venv) $ make causal-run-original-model The repository /path/to/model contains custom code which must be executed to correctly load the model. You can inspect the repository content at /path/to/model. Do you wish to run the custom code? [y/N] N ``` Having this as the default seems like a safe choice as we have to clone or download the models we convert and would be expecting to run any custom code they have.	2025-10-24 12:02:02 +02:00
bssrdf	2715341c1d	WIP: output	2025-10-23 21:29:45 -04:00
compilade	f8f071fadd	convert : handle pre-quantized models (#14810 ) * convert : begin handling pre-quantized models * convert : fix conversion from FP8 for Deepseek-V3.1-Base	2025-10-23 16:31:41 -04:00
Johannes Gäßler	0bf47a1dbb	server: add memory breakdown print (#16740 )	2025-10-23 21:30:17 +02:00
bssrdf	66f6d16265	WIP	2025-10-23 13:52:26 -04:00
Julien Denize	dd62dcfab9	convert : Make mistral-common dependency optional (#16738 ) * Make mistral-common dependency optional * Fix typing	2025-10-23 15:54:46 +02:00
Xuan-Son Nguyen	d0660f237a	mtmd-cli : allow using --jinja (#16718 ) * mtmd-cli : allow using --jinja * support -sys * implement chat_history * fix clear memory * rm -sys support, added TODO	2025-10-23 15:00:49 +02:00
Prajwal B Mehendarkar	fe6a9882ac	Manually link -lbsd to resolve flock symbol on AIX (#16610 )	2025-10-23 19:37:31 +08:00

... 21 22 23 24 25 ...

7999 Commits All Branches Search

7999 Commits

All Branches