llama.cpp/gguf-py/scripts
Sigbjørn Skjæret 9e4968cf67
Add special token modification capability
To be able to fix/amend special tokens in a GGUF let's add two new arguments:
* `--special-token <name> <value>` where `<name>` can be bos, eos, prefix, middle, etc. while `<value>` is the token value, f.ex. `"<|fim▁begin|>"`
* `--special-token-by-id <name> <id>` where `<id>` is the ID of the token, f.ex. 32006

So, in order to f.ex. add fill-in-middle tokens to a GGUF you would do the following:
```bash
python3 gguf-new-metadata.py input.gguf output.gguf --special-token prefix "<|fim▁begin|>" --special-token middle "<|fim▁hole|>" --special-token suffix "<|fim▁end|>"
```
2024-04-20 08:33:54 +02:00
..
__init__.py convert : support models with multiple chat templates (#6588) 2024-04-18 14:49:01 +03:00
gguf-convert-endian.py Fix gguf-convert-endian script (#4037) 2023-11-11 08:35:31 -07:00
gguf-dump.py Respect tokenizer.ggml.add_bos_token value when tokenizing (#4040) 2023-11-16 19:14:37 -07:00
gguf-new-metadata.py Add special token modification capability 2024-04-20 08:33:54 +02:00
gguf-set-metadata.py gguf-py: Refactor and allow reading/modifying existing GGUF files (#3981) 2023-11-11 08:04:50 +03:00