llama.cpp/examples
Francis Couture-Harpin 8e39037b86 llama : refactor session file management
* llama : saving and restoring state checks for overflow

The size of the buffers should now be given to the functions working
with them, otherwise a truncated file could cause out of bound reads.

* llama : stream from session file instead of copying into a big buffer

Loading session files should no longer cause a memory usage spike.

* llama : llama_state_get_size returns the actual size instead of max

This is a breaking change, but makes that function *much* easier
to keep up to date, and it also makes it reflect the behavior
of llama_state_seq_get_size.

* llama : share code between whole and seq_id-specific state saving

Both session file types now use a more similar format.

* llama : no longer store all hparams in session files

Instead, the model arch name is stored.
The layer count and the embedding dimensions of the KV cache
are still verified when loading.
Storing all the hparams is not necessary.
2024-07-25 21:40:26 -04:00
..
baby-llama `build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809) 2024-06-13 00:41:52 +01:00
batched batched: fix n_predict parameter (#8527) 2024-07-17 10:34:28 +03:00
batched-bench `build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809) 2024-06-13 00:41:52 +01:00
batched.swift Detokenizer fixes (#8039) 2024-07-05 19:01:35 +02:00
benchmark `build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809) 2024-06-13 00:41:52 +01:00
convert-llama2c-to-ggml `build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809) 2024-06-13 00:41:52 +01:00
cvector-generator cvector: better prompt handling, add "mean vector" method (#8069) 2024-06-25 13:59:54 +02:00
deprecation-warning Deprecation warning to assist with migration to new binary names (#8283) 2024-07-09 11:54:43 -04:00
embedding Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. (#8258) 2024-07-02 12:18:10 -04:00
eval-callback examples : sprintf -> snprintf (#8434) 2024-07-12 10:46:14 +03:00
export-lora examples : Fix `llama-export-lora` example (#8607) 2024-07-23 23:48:37 +02:00
finetune py : type-check all Python scripts with Pyright (#8341) 2024-07-07 15:04:39 -04:00
gbnf-validator llama : move vocab, grammar and sampling into separate files (#8508) 2024-07-23 13:10:17 +03:00
gguf gguf : handle null name during init (#8587) 2024-07-20 17:15:42 +03:00
gguf-hash gguf-hash : update clib.json to point to original xxhash repo (#8491) 2024-07-16 10:14:16 +03:00
gguf-split `build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809) 2024-06-13 00:41:52 +01:00
gritlm llama : allow pooled embeddings on any model (#7477) 2024-06-21 08:38:22 +03:00
imatrix llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00
infill infill : assert prefix/suffix tokens + remove old space logic (#8351) 2024-07-08 09:34:35 +03:00
jeopardy `build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809) 2024-06-13 00:41:52 +01:00
llama-bench [CANN] Add Ascend NPU backend (#6035) 2024-07-17 14:23:50 +03:00
llama.android examples: fix android example cannot be generated continuously (#8621) 2024-07-22 09:54:42 +03:00
llama.swiftui llama.swiftui: fix end of generation bug (#8268) 2024-07-20 16:09:37 +03:00
llava [CANN] Add Ascend NPU backend (#6035) 2024-07-17 14:23:50 +03:00
lookahead `build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809) 2024-06-13 00:41:52 +01:00
lookup lookup: fibonacci hashing, fix crashes (#8548) 2024-07-17 23:35:44 +02:00
main main : print error on empty input (#8456) 2024-07-12 14:48:04 +03:00
main-cmake-pkg Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. (#8258) 2024-07-02 12:18:10 -04:00
parallel `build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809) 2024-06-13 00:41:52 +01:00
passkey passkey : add short intro to README.md [no-ci] (#8317) 2024-07-05 09:14:24 +03:00
perplexity ppl : fix n_seq_max for perplexity (#8277) 2024-07-03 20:33:31 +03:00
quantize llama : valign + remove unused ftype (#8502) 2024-07-16 10:00:30 +03:00
quantize-stats ggml : minor naming changes (#8433) 2024-07-12 10:46:02 +03:00
retrieval llama : allow pooled embeddings on any model (#7477) 2024-06-21 08:38:22 +03:00
rpc llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00
save-load-state llama : refactor session file management 2024-07-25 21:40:26 -04:00
server server : fix URL.parse in the UI (#8646) 2024-07-23 17:37:42 +03:00
simple `build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809) 2024-06-13 00:41:52 +01:00
speculative `build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809) 2024-06-13 00:41:52 +01:00
sycl Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. (#8258) 2024-07-02 12:18:10 -04:00
tokenize tokenize : add --no-parse-special option (#8423) 2024-07-11 10:41:48 +03:00
train-text-from-scratch py : type-check all Python scripts with Pyright (#8341) 2024-07-07 15:04:39 -04:00
CMakeLists.txt gguf-hash: model wide and per tensor hashing using xxhash and sha1 (#8048) 2024-07-07 22:58:43 +10:00
Miku.sh `build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809) 2024-06-13 00:41:52 +01:00
base-translate.sh `build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809) 2024-06-13 00:41:52 +01:00
chat-13B.bat Create chat-13B.bat (#592) 2023-03-29 20:21:09 +03:00
chat-13B.sh `build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809) 2024-06-13 00:41:52 +01:00
chat-persistent.sh `build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809) 2024-06-13 00:41:52 +01:00
chat-vicuna.sh `build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809) 2024-06-13 00:41:52 +01:00
chat.sh `build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809) 2024-06-13 00:41:52 +01:00
convert_legacy_llama.py convert-*.py: GGUF Naming Convention Refactor and Metadata Override Refactor (#7499) 2024-07-18 20:40:15 +10:00
json_schema_pydantic_example.py py : type-check all Python scripts with Pyright (#8341) 2024-07-07 15:04:39 -04:00
json_schema_to_grammar.py py : type-check all Python scripts with Pyright (#8341) 2024-07-07 15:04:39 -04:00
llama.vim llama.vim : added api key support (#5090) 2024-01-23 08:51:27 +02:00
llm.vim llm.vim : stop generation at multiple linebreaks, bind to <F2> (#2879) 2023-08-30 09:50:55 +03:00
pydantic_models_to_grammar.py pydantic : replace uses of __annotations__ with get_type_hints (#8474) 2024-07-14 19:51:21 -04:00
pydantic_models_to_grammar_examples.py examples : Rewrite pydantic_models_to_grammar_examples.py (#8493) 2024-07-20 22:09:17 -04:00
reason-act.sh `build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809) 2024-06-13 00:41:52 +01:00
regex_to_grammar.py py : switch to snake_case (#8305) 2024-07-05 07:53:33 +03:00
server-llama2-13B.sh `build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809) 2024-06-13 00:41:52 +01:00
server_embd.py py : type-check all Python scripts with Pyright (#8341) 2024-07-07 15:04:39 -04:00
ts-type-to-grammar.sh JSON schema conversion: ️ faster repetitions, min/maxLength for strings, cap number length (#6555) 2024-04-12 19:43:38 +01:00