llama.cpp

History

Jeff Bolz e68aa10d8f vulkan: sort graph to allow more parallel execution (#15850 ) * vulkan: sort graph to allow more parallel execution Add a backend proc to allow the backend to modify the graph. The vulkan implementation looks at which nodes depend on each other and greedily reorders them to group together nodes that don't depend on each other. It only reorders the nodes, doesn't change the contents of any of them. With #15489, this reduces the number of synchronizations needed. * call optimize_graph per-split		2025-09-09 02:10:07 +08:00
..
cmake	ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (#15094 )	2025-08-07 13:45:41 +02:00
include	cuda : fix supports_op condition for get_rows when number of blocks is too large (#15868 )	2025-09-08 13:56:51 +03:00
src	vulkan: sort graph to allow more parallel execution (#15850 )	2025-09-09 02:10:07 +08:00
.gitignore	vulkan : cmake integration (#8119 )	2024-07-13 18:12:39 +02:00
CMakeLists.txt	ggml-cpu: drop support for nnpa intrinsics (#15821 )	2025-09-06 11:27:28 +08:00