When shift_context() discards tokens to free KV cache space, it decrements current_position but not stop_generation_position. This causes the termination check (current_position >= stop_generation_position) to never trigger, resulting in infinite text generation. Fix by also decrementing stop_generation_position by n_discard tokens. Fixes #18409 |
||
|---|---|---|
| .. | ||
| src | ||
| .gitignore | ||
| build.gradle.kts | ||
| consumer-rules.pro | ||
| proguard-rules.pro | ||