llama.cpp/examples/tokenize
Mikko Juola cd7b5f7f78 Make tokenizer.cpp CLI tool nicer.
Before this commit, tokenize was a simple CLI tool like this:

  tokenize MODEL_FILENAME PROMPT [--ids]

This simple tool loads the model, takes the prompt, and shows the tokens
llama.cpp is interpreting.

This changeset makes the tokenize more sophisticated, and more useful
for debugging and troubleshooting:

  tokenize [-m, --model MODEL_FILENAME]
           [--ids]
           [--stdin]
           [--prompt]
           [-f, --file]
           [--no-bos]
           [--log-disable]

It also behaves nicer on Windows now, interpreting and rendering Unicode
from command line arguments and pipes no matter what code page the user
has set on their terminal.
2024-03-26 14:13:02 -07:00
..
CMakeLists.txt examples : add tokenize (#4039) 2023-11-17 17:36:44 +02:00
tokenize.cpp Make tokenizer.cpp CLI tool nicer. 2024-03-26 14:13:02 -07:00