llama.cpp

History

Chenguang Li 28b5f190ef CANN: implement LRU cache for ACL graphs (#15814 ) * CANN: implement LRU cache for ACL graphs in CANN backend - Introduce ggml_cann_graph_lru_cache to store multiple ggml_cann_graph objects. - Graphs are loaded on demand and evicted using LRU policy when capacity is exceeded. - Updated push, move_to_front, and clear methods to manage cached graphs efficiently. - Ensures reuse of graphs, reducing graph reconstruction overhead in CANN backend. * fix typo * The LRU cache capacity can be configured via an env variable Signed-off-by: noemotiovon <757486878@qq.com> * refactory acl graph * refactory && fix review comments Signed-off-by: noemotiovon <757486878@qq.com> --------- Signed-off-by: noemotiovon <757486878@qq.com>		2025-09-10 15:29:12 +08:00
..
BLIS.md	make : deprecate (#10514 )	2024-12-02 21:22:53 +02:00
CANN.md	CANN: implement LRU cache for ACL graphs (#15814 )	2025-09-10 15:29:12 +08:00
CUDA-FEDORA.md	docs: update: improve the Fedoa CUDA guide (#12536 )	2025-03-24 11:02:26 +00:00
OPENCL.md	opencl: update doc for OpenCL (#12702 )	2025-04-03 22:18:17 -07:00
SYCL.md	sycl: GGML_SYCL_DISABLE_OPT on by default for all Intel Devices (#13973 )	2025-06-25 18:09:55 +02:00