llama.cpp/ggml/src/ggml-cpu
Jeff Bolz e68aa10d8f
vulkan: sort graph to allow more parallel execution (#15850)
* vulkan: sort graph to allow more parallel execution

Add a backend proc to allow the backend to modify the graph. The
vulkan implementation looks at which nodes depend on each other
and greedily reorders them to group together nodes that don't
depend on each other. It only reorders the nodes, doesn't change
the contents of any of them.

With #15489, this reduces the number of synchronizations needed.

* call optimize_graph per-split
2025-09-09 02:10:07 +08:00
..
amx ggml-cpu: enable IBM NNPA Vector Intrinsics (#14317) 2025-06-25 23:49:04 +02:00
arch ggml-cpu: clean up s390x SIMD (#15855) 2025-09-08 02:18:28 +08:00
cmake ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
kleidiai kleidiai: generalize compute_forward_kv_cache to compute_forward_fp16 (#15817) 2025-09-06 22:08:43 +08:00
llamafile llamafile: PowerPC Sgemm Optimization (#15558) 2025-08-26 23:35:25 +08:00
CMakeLists.txt ggml-cpu: drop support for nnpa intrinsics (#15821) 2025-09-06 11:27:28 +08:00
arch-fallback.h ggml-cpu: Support Q5_0 and Q5_1 on s390x (#15486) 2025-08-22 16:11:04 +08:00
binary-ops.cpp cpu: de-duplicate some of the operators and refactor (ggml/1144) 2025-03-30 08:33:31 +03:00
binary-ops.h cpu: de-duplicate some of the operators and refactor (ggml/1144) 2025-03-30 08:33:31 +03:00
common.h ggml-cpu: enable IBM NNPA Vector Intrinsics (#14317) 2025-06-25 23:49:04 +02:00
ggml-cpu-impl.h ggml-cpu: clean up s390x SIMD (#15855) 2025-09-08 02:18:28 +08:00
ggml-cpu.c ggml: allow casting between f32 and i32 (#15783) 2025-09-08 12:33:01 +02:00
ggml-cpu.cpp vulkan: sort graph to allow more parallel execution (#15850) 2025-09-09 02:10:07 +08:00
hbm.cpp ggml-cpu : split arch-specific implementations (#13892) 2025-06-09 16:47:13 +02:00
hbm.h ggml-cpu : split arch-specific implementations (#13892) 2025-06-09 16:47:13 +02:00
ops.cpp ggml: allow casting between f32 and i32 (#15783) 2025-09-08 12:33:01 +02:00
ops.h ggml: add ops for WAN video model (cuda && cpu) (#15669) 2025-09-04 10:38:49 +02:00
quants.c llama : add gpt-oss (#15091) 2025-08-05 22:10:36 +03:00
quants.h llama : add gpt-oss (#15091) 2025-08-05 22:10:36 +03:00
repack.cpp ggml : repack block_iq4_nlx8 (#14904) 2025-08-13 11:09:39 +03:00
repack.h ggml : repack block_iq4_nlx8 (#14904) 2025-08-13 11:09:39 +03:00
simd-mappings.h ggml-cpu: drop support for nnpa intrinsics (#15821) 2025-09-06 11:27:28 +08:00
traits.cpp ggml : fix fallback to CPU for ununsupported ops (#15118) 2025-08-06 14:37:35 +02:00
traits.h ggml : fix fallback to CPU for ununsupported ops (#15118) 2025-08-06 14:37:35 +02:00
unary-ops.cpp cpu: de-duplicate some of the operators and refactor (ggml/1144) 2025-03-30 08:33:31 +03:00
unary-ops.h cpu: de-duplicate some of the operators and refactor (ggml/1144) 2025-03-30 08:33:31 +03:00
vec.cpp ggml-cpu : optimize RVV kernels (#15720) 2025-09-03 16:16:21 +08:00
vec.h ggml-cpu : optimize RVV kernels (#15720) 2025-09-03 16:16:21 +08:00