llama.cpp/examples/simple-token-healing
mare5x c77bb3203c examples : add simple token healing example 2024-05-01 20:05:19 +02:00
..
CMakeLists.txt examples : add simple token healing example 2024-05-01 20:05:19 +02:00
README.md examples : add simple token healing example 2024-05-01 20:05:19 +02:00
simple-token-healing-1.cpp examples : add simple token healing example 2024-05-01 20:05:19 +02:00
simple-token-healing.cpp examples : add simple token healing example 2024-05-01 20:05:19 +02:00

README.md

llama.cpp/example/simple-token-healing

This example extends simple with token healing.

Without token healing:

./simple ./models/phi-2/ggml-model-q4_0.gguf "print('Hel"
...
main: n_len = 32, n_ctx = 2048, n_kv_req = 32

print('Helping the customer')
...

Heal the last token (1):

./simple-token-healing ./models/phi-2/ggml-model-q4_0.gguf "print('Hel" 1
...
token_healing: prefix = 'Hel' (1 tokens)
 [ 12621] 'Hel'
 [ 15496] 'Hello'
 [ 22087] 'Help'
 [ 28254] 'Hell'
 [ 47429] 'Helper'

main: n_len = 32, n_ctx = 2048, n_kv_req = 32

print('Hello, World!')
...

Backtrack multiple tokens until there doesn't exist a token which can cover the prompt's suffix (n):

./simple-token-healing ./models/phi-2/ggml-model-q4_0.gguf "print('Hello, worl" n
...
token_healing: prefix = ' worl' (2 tokens)
 [   995] ' world'
 [  8688] ' worldwide'
 [ 11621] ' worlds'
 [ 29081] ' worldview'
 [ 43249] ' worldly'

main: n_len = 32, n_ctx = 2048, n_kv_req = 32

print('Hello, world!')
...

Backtrack multiple tokens but don't constrain the decoding to a single token (m):

./simple-token-healing ./models/phi-2/ggml-model-q4_0.gguf "print('Hello, worl" m
...
token_healing: prefix = ' worl' (2 tokens)

main: n_len = 32, n_ctx = 2048, n_kv_req = 32

print('Hello,
token_healing: prefix = ' worl'
 [   220] ' '
 [   266] ' w'
 [   476] ' wor'
 [   995] ' world'
 [  8688] ' worldwide'
 [ 11621] ' worlds'
 [ 24486] ' wo'
 [ 29081] ' worldview'
 [ 43249] ' worldly'
 world!')
...