llama.cpp

History

Aristeidis Stathopoulos d13ed9f2f1 llama: add cb_pre_alloc callback for pre-allocation backend reassignment Add a new llama_pre_alloc_callback that fires after graph construction but before memory allocation in llama_decode/llama_encode. This allows downstream consumers to call ggml_backend_sched_set_tensor_backend() to route specific ops (e.g. attention) to a different backend without modifying llama.cpp internals. Changes: - Add llama_pre_alloc_callback typedef to llama.h - Add cb_pre_alloc + cb_pre_alloc_user_data to llama_context_params and llama_cparams - Invoke callback in process_ubatch() between build_graph and alloc_graph - Add test that verifies callback invocation and backend reassignment Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-07 11:39:43 +02:00
..
llama-cpp.h	lora: make sure model keep track of associated adapters (#18490 )	2026-01-15 10:24:28 +01:00
llama.h	llama: add cb_pre_alloc callback for pre-allocation backend reassignment	2026-03-07 11:39:43 +02:00

Aristeidis Stathopoulos d13ed9f2f1 llama: add cb_pre_alloc callback for pre-allocation backend reassignment

Add a new llama_pre_alloc_callback that fires after graph construction
but before memory allocation in llama_decode/llama_encode. This allows
downstream consumers to call ggml_backend_sched_set_tensor_backend()
to route specific ops (e.g. attention) to a different backend without
modifying llama.cpp internals.

Changes:
- Add llama_pre_alloc_callback typedef to llama.h
- Add cb_pre_alloc + cb_pre_alloc_user_data to llama_context_params
  and llama_cparams
- Invoke callback in process_ubatch() between build_graph and
  alloc_graph
- Add test that verifies callback invocation and backend reassignment

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-07 11:39:43 +02:00

llama-cpp.h

lora: make sure model keep track of associated adapters (#18490 )

2026-01-15 10:24:28 +01:00

llama.h

llama: add cb_pre_alloc callback for pre-allocation backend reassignment

2026-03-07 11:39:43 +02:00