llama.cpp/include
Aristeidis Stathopoulos d13ed9f2f1 llama: add cb_pre_alloc callback for pre-allocation backend reassignment
Add a new llama_pre_alloc_callback that fires after graph construction
but before memory allocation in llama_decode/llama_encode. This allows
downstream consumers to call ggml_backend_sched_set_tensor_backend()
to route specific ops (e.g. attention) to a different backend without
modifying llama.cpp internals.

Changes:
- Add llama_pre_alloc_callback typedef to llama.h
- Add cb_pre_alloc + cb_pre_alloc_user_data to llama_context_params
  and llama_cparams
- Invoke callback in process_ubatch() between build_graph and
  alloc_graph
- Add test that verifies callback invocation and backend reassignment

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 11:39:43 +02:00
..
llama-cpp.h lora: make sure model keep track of associated adapters (#18490) 2026-01-15 10:24:28 +01:00
llama.h llama: add cb_pre_alloc callback for pre-allocation backend reassignment 2026-03-07 11:39:43 +02:00