llama.cpp

History

Chenguang Li 28b5f190ef CANN: implement LRU cache for ACL graphs (#15814 ) * CANN: implement LRU cache for ACL graphs in CANN backend - Introduce ggml_cann_graph_lru_cache to store multiple ggml_cann_graph objects. - Graphs are loaded on demand and evicted using LRU policy when capacity is exceeded. - Updated push, move_to_front, and clear methods to manage cached graphs efficiently. - Ensures reuse of graphs, reducing graph reconstruction overhead in CANN backend. * fix typo * The LRU cache capacity can be configured via an env variable Signed-off-by: noemotiovon <757486878@qq.com> * refactory acl graph * refactory && fix review comments Signed-off-by: noemotiovon <757486878@qq.com> --------- Signed-off-by: noemotiovon <757486878@qq.com>		2025-09-10 15:29:12 +08:00
..
backend	CANN: implement LRU cache for ACL graphs (#15814 )	2025-09-10 15:29:12 +08:00
development	docs : update HOWTO‑add‑model.md for ModelBase and new model classes (#14874 )	2025-07-25 16:25:05 +02:00
multimodal	model : support MiniCPM-V 4.5 (#15575 )	2025-08-26 10:05:55 +02:00
ops	ggml: initial IBM zDNN backend (#14975 )	2025-08-15 21:11:22 +08:00
android.md	repo : update links to new url (#11886 )	2025-02-15 16:40:57 +02:00
build-s390x.md	ggml-cpu: drop support for nnpa intrinsics (#15821 )	2025-09-06 11:27:28 +08:00
build.md	Update build.md to remove MSVC arm64 notes (#15684 )	2025-08-30 23:51:28 +08:00
docker.md	musa: upgrade musa sdk to rc4.2.0 (#14498 )	2025-07-24 20:05:37 +01:00
function-calling.md	server : add documentation for `parallel_tool_calls` param (#15647 )	2025-08-29 20:25:40 +03:00
install.md	docs : add "Quick start" section for new users (#13862 )	2025-06-03 13:09:36 +02:00
llguidance.md	llguidance build fixes for Windows (#11664 )	2025-02-14 12:46:08 -08:00
multimodal.md	mtmd : add support for Voxtral (#14862 )	2025-07-28 15:01:48 +02:00
ops.md	ggml: initial IBM zDNN backend (#14975 )	2025-08-15 21:11:22 +08:00