squash! sampling : support intermixed backend/cpu samplers

Fix llama-save-load-state which currently fails by handling the case when batch.logits is nullptr (like when loading state) by allocating space for all outputs as CPU logits.
2025-11-28 13:46:51 +01:00 · 2025-11-28 13:46:51 +01:00 · 459b7ae7b9
parent 9ad6522be6
commit 459b7ae7b9
1 changed files with 4 additions and 0 deletions
--- a/src/llama-context.cpp
+++ b/src/llama-context.cpp
@ -1676,6 +1676,10 @@ uint32_t llama_context::output_reserve(int32_t n_outputs, const llama_batch & ba
                }
            }
        }
    } else {
        // When batch.logits is nullptr (when loading state with a dummy batch),
        // allocate CPU logits.
        batch_needs_cpu_logits = true;
    }
    size_t backend_float_count = 0;