llama.cpp/examples
Daniele Pinna 1488339138
lookup, lookahead: fix crash when n_ctx not specified (#18729)
* lookup, lookahead: fix crash when n_ctx not specified

Since PR #16653 (Dec 15, 2025), the default n_ctx is 0 to enable automatic
GPU memory fitting. This causes llama-lookup and llama-lookahead to crash
when run without explicit -c flag:

    GGML_ASSERT(batch.seq_id[batch.n_tokens] && "llama_batch size exceeded")

Root cause: Both examples use params.n_ctx directly for batch initialization,
but params.n_ctx remains 0 even after the context is properly initialized
to n_ctx_train internally.

Bug history:
- Nov 2023: lookahead.cpp created (PR #4207) with params.n_ctx pattern
- Dec 2023: lookup.cpp created (PR #4484) with same pattern
- Nov 2024: default n_ctx changed to 4096 (PR #10136) - bug dormant
- Dec 2025: default n_ctx changed to 0 (PR #16653) - bug activated

The bug was dormant for 2+ years because params.n_ctx defaulted to 512,
then 4096. PR #16653 changed it to 0 for GPU auto-fitting, triggering
the crash.

Fix: Use llama_n_ctx(ctx) to get the actual runtime context size, matching
the pattern already used elsewhere in lookup.cpp (line 72) and in
speculative.cpp/speculative-simple.cpp.

Tested: llama-lookup now works without -c flag (12.5% acceptance on
Gemma-3-1B).

Note: llama-lookahead has a separate pre-existing issue with sequence
initialization (n_seq_max=1 vs W+G+1 needed) that is unrelated to this fix.

* lookahead: fix n_seq_max and kv_unified configuration

Lookahead decoding requires:
- W + G + 1 = 31 sequences for parallel Jacobi decoding
- Unified KV cache for coupled sequences in batch splitting

These requirements were broken after PR #14482 changed validation logic.

Consolidates fix from PR #18730 per maintainer request.

Commit message drafted with Claude.
2026-01-30 22:10:24 +02:00
..
batched context : reserve new scheduler when graph topology changes (#18547) 2026-01-15 16:39:17 +02:00
batched.swift examples : remove references to `make` in examples [no ci] (#15457) 2025-08-21 06:12:28 +02:00
convert-llama2c-to-ggml gguf: gguf_writer refactor (#15691) 2025-09-05 11:34:28 +02:00
debug Restore clip's cb() to its rightful glory - extract common debugging elements in llama (#17914) 2026-01-14 20:29:35 +01:00
deprecation-warning Update deprecation-warning.cpp (#10619) 2024-12-04 23:19:20 +01:00
diffusion llama : add `use_direct_io` flag for model loading (#18166) 2026-01-08 08:35:30 +02:00
embedding model : add LFM2-ColBert-350M (#18607) 2026-01-05 19:52:56 +01:00
eval-callback tests : download models only when running ctest (#18843) 2026-01-15 09:47:29 +01:00
gen-docs gen-docs: automatically update markdown file (#18294) 2025-12-22 19:30:19 +01:00
gguf examples(gguf): GGUF example outputs (#17025) 2025-11-05 19:58:16 +02:00
gguf-hash GGUF: C++ refactor, backend support, misc fixes (#11030) 2025-01-07 18:01:58 +01:00
idle metal : add residency sets keep-alive heartbeat (#17766) 2025-12-05 19:38:54 +02:00
llama.android refactor : remove libcurl, use OpenSSL when available (#18828) 2026-01-14 18:02:47 +01:00
llama.swiftui llama : deprecate llama_kv_self_ API (#14030) 2025-06-06 14:11:15 +03:00
lookahead lookup, lookahead: fix crash when n_ctx not specified (#18729) 2026-01-30 22:10:24 +02:00
lookup lookup, lookahead: fix crash when n_ctx not specified (#18729) 2026-01-30 22:10:24 +02:00
model-conversion model-conversion : use BUILD_DIR variable in all scripts (#19015) 2026-01-23 09:01:36 +01:00
parallel common : refactor common_sampler + grammar logic changes (#17937) 2025-12-14 10:11:13 +02:00
passkey examples : remove references to `make` in examples [no ci] (#15457) 2025-08-21 06:12:28 +02:00
retrieval model : add LFM2-ColBert-350M (#18607) 2026-01-05 19:52:56 +01:00
save-load-state common : refactor common_sampler + grammar logic changes (#17937) 2025-12-14 10:11:13 +02:00
simple examples : support encoder-decoder models in the simple example (#16002) 2025-09-17 10:29:00 +03:00
simple-chat simple-chat : fix context-exceeded condition (#14494) 2025-07-02 14:12:07 +03:00
simple-cmake-pkg examples : add missing code block end marker [no ci] (#17756) 2025-12-04 14:17:30 +01:00
speculative spec : add self‑speculative decoding (no draft model required) + refactor (#18471) 2026-01-28 19:42:42 +02:00
speculative-simple spec : add self‑speculative decoding (no draft model required) + refactor (#18471) 2026-01-28 19:42:42 +02:00
sycl refactor : remove libcurl, use OpenSSL when available (#18828) 2026-01-14 18:02:47 +01:00
training common : refactor common_sampler + grammar logic changes (#17937) 2025-12-14 10:11:13 +02:00
CMakeLists.txt examples : add debug utility/example (#18464) 2026-01-07 10:42:19 +01:00
convert_legacy_llama.py metadata: Detailed Dataset Authorship Metadata (#8875) 2024-11-13 21:10:38 +11:00
json_schema_pydantic_example.py py : type-check all Python scripts with Pyright (#8341) 2024-07-07 15:04:39 -04:00
json_schema_to_grammar.py common : fix json schema with '\' in literals (#17307) 2025-11-29 17:06:32 +01:00
llama.vim llama : remove KV cache defragmentation logic (#15473) 2025-08-22 12:22:13 +03:00
pydantic_models_to_grammar.py pydantic : replace uses of __annotations__ with get_type_hints (#8474) 2024-07-14 19:51:21 -04:00
pydantic_models_to_grammar_examples.py llama : move end-user examples to tools directory (#13249) 2025-05-02 20:27:13 +02:00
reason-act.sh scripts : make the shell scripts cross-platform (#14341) 2025-06-30 10:17:18 +02:00
regex_to_grammar.py py : switch to snake_case (#8305) 2024-07-05 07:53:33 +03:00
server-llama2-13B.sh scripts : make the shell scripts cross-platform (#14341) 2025-06-30 10:17:18 +02:00
server_embd.py llama : fix FA when KV cache is not used (i.e. embeddings) (#12825) 2025-04-08 19:54:51 +03:00
ts-type-to-grammar.sh scripts : make the shell scripts cross-platform (#14341) 2025-06-30 10:17:18 +02:00