llama.cpp

History

Jeff Bolz e68aa10d8f vulkan: sort graph to allow more parallel execution (#15850 ) * vulkan: sort graph to allow more parallel execution Add a backend proc to allow the backend to modify the graph. The vulkan implementation looks at which nodes depend on each other and greedily reorders them to group together nodes that don't depend on each other. It only reorders the nodes, doesn't change the contents of any of them. With #15489, this reduces the number of synchronizations needed. * call optimize_graph per-split		2025-09-09 02:10:07 +08:00
..
amx	ggml-cpu: enable IBM NNPA Vector Intrinsics (#14317 )	2025-06-25 23:49:04 +02:00
arch	ggml-cpu: clean up s390x SIMD (#15855 )	2025-09-08 02:18:28 +08:00
cmake	ggml : build backends as libraries (#10256 )	2024-11-14 18:04:35 +01:00
kleidiai	kleidiai: generalize compute_forward_kv_cache to compute_forward_fp16 (#15817 )	2025-09-06 22:08:43 +08:00
llamafile	llamafile: PowerPC Sgemm Optimization (#15558 )	2025-08-26 23:35:25 +08:00
CMakeLists.txt	ggml-cpu: drop support for nnpa intrinsics (#15821 )	2025-09-06 11:27:28 +08:00
arch-fallback.h	ggml-cpu: Support Q5_0 and Q5_1 on s390x (#15486 )	2025-08-22 16:11:04 +08:00
binary-ops.cpp	cpu: de-duplicate some of the operators and refactor (ggml/1144)	2025-03-30 08:33:31 +03:00
binary-ops.h	cpu: de-duplicate some of the operators and refactor (ggml/1144)	2025-03-30 08:33:31 +03:00
common.h	ggml-cpu: enable IBM NNPA Vector Intrinsics (#14317 )	2025-06-25 23:49:04 +02:00
ggml-cpu-impl.h	ggml-cpu: clean up s390x SIMD (#15855 )	2025-09-08 02:18:28 +08:00
ggml-cpu.c	ggml: allow casting between f32 and i32 (#15783 )	2025-09-08 12:33:01 +02:00
ggml-cpu.cpp	vulkan: sort graph to allow more parallel execution (#15850 )	2025-09-09 02:10:07 +08:00
hbm.cpp	ggml-cpu : split arch-specific implementations (#13892 )	2025-06-09 16:47:13 +02:00
hbm.h	ggml-cpu : split arch-specific implementations (#13892 )	2025-06-09 16:47:13 +02:00
ops.cpp	ggml: allow casting between f32 and i32 (#15783 )	2025-09-08 12:33:01 +02:00
ops.h	ggml: add ops for WAN video model (cuda && cpu) (#15669 )	2025-09-04 10:38:49 +02:00
quants.c	llama : add gpt-oss (#15091 )	2025-08-05 22:10:36 +03:00
quants.h	llama : add gpt-oss (#15091 )	2025-08-05 22:10:36 +03:00
repack.cpp	ggml : repack block_iq4_nlx8 (#14904 )	2025-08-13 11:09:39 +03:00
repack.h	ggml : repack block_iq4_nlx8 (#14904 )	2025-08-13 11:09:39 +03:00
simd-mappings.h	ggml-cpu: drop support for nnpa intrinsics (#15821 )	2025-09-06 11:27:28 +08:00
traits.cpp	ggml : fix fallback to CPU for ununsupported ops (#15118 )	2025-08-06 14:37:35 +02:00
traits.h	ggml : fix fallback to CPU for ununsupported ops (#15118 )	2025-08-06 14:37:35 +02:00
unary-ops.cpp	cpu: de-duplicate some of the operators and refactor (ggml/1144)	2025-03-30 08:33:31 +03:00
unary-ops.h	cpu: de-duplicate some of the operators and refactor (ggml/1144)	2025-03-30 08:33:31 +03:00
vec.cpp	ggml-cpu : optimize RVV kernels (#15720 )	2025-09-03 16:16:21 +08:00
vec.h	ggml-cpu : optimize RVV kernels (#15720 )	2025-09-03 16:16:21 +08:00