llama.cpp

Commit Graph

Author	SHA1	Message	Date
Aristeidis Stathopoulos	96d9f47a9c	test: address second round of review feedback - Remove unused #include <cstring> - Fix false positive when only one backend is available - Clarify comment: "reassign graph nodes" instead of "reassign ops" Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-07 13:18:15 +02:00
Aristeidis Stathopoulos	58faba86a9	test: address review feedback for test-pre-alloc-callback - Add missing llama_backend_free() on model load failure path - Only print diagnostics on failure, not on success - Pick target backend by finding one different from current instead of assuming backend ordering Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-07 12:07:47 +02:00
Aristeidis Stathopoulos	d13ed9f2f1	llama: add cb_pre_alloc callback for pre-allocation backend reassignment Add a new llama_pre_alloc_callback that fires after graph construction but before memory allocation in llama_decode/llama_encode. This allows downstream consumers to call ggml_backend_sched_set_tensor_backend() to route specific ops (e.g. attention) to a different backend without modifying llama.cpp internals. Changes: - Add llama_pre_alloc_callback typedef to llama.h - Add cb_pre_alloc + cb_pre_alloc_user_data to llama_context_params and llama_cparams - Invoke callback in process_ubatch() between build_graph and alloc_graph - Add test that verifies callback invocation and backend reassignment Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-07 11:39:43 +02:00

Author

SHA1

Message

Date

Aristeidis Stathopoulos

96d9f47a9c

test: address second round of review feedback

- Remove unused #include <cstring>
- Fix false positive when only one backend is available
- Clarify comment: "reassign graph nodes" instead of "reassign ops"

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-07 13:18:15 +02:00

Aristeidis Stathopoulos

58faba86a9

test: address review feedback for test-pre-alloc-callback

- Add missing llama_backend_free() on model load failure path
- Only print diagnostics on failure, not on success
- Pick target backend by finding one different from current instead
  of assuming backend ordering

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-07 12:07:47 +02:00

Aristeidis Stathopoulos

d13ed9f2f1

llama: add cb_pre_alloc callback for pre-allocation backend reassignment

Add a new llama_pre_alloc_callback that fires after graph construction
but before memory allocation in llama_decode/llama_encode. This allows
downstream consumers to call ggml_backend_sched_set_tensor_backend()
to route specific ops (e.g. attention) to a different backend without
modifying llama.cpp internals.

Changes:
- Add llama_pre_alloc_callback typedef to llama.h
- Add cb_pre_alloc + cb_pre_alloc_user_data to llama_context_params
  and llama_cparams
- Invoke callback in process_ubatch() between build_graph and
  alloc_graph
- Add test that verifies callback invocation and backend reassignment

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-07 11:39:43 +02:00

3 Commits