llama.cpp/examples
ManniX-ITA d9157cdf34
Update server.cpp example with correct startup sequence
The HTTP listener start and the health API endpoint are moved before the model loading starts, hence the server can correctly report is loading the model
2024-04-17 18:55:09 +02:00
..
baby-llama code : normalize enum names (#5697) 2024-02-25 12:09:09 +02:00
batched metal : pad n_ctx by 32 (#6177) 2024-03-22 09:36:03 +02:00
batched-bench bench : make n_batch and n_ubatch configurable in Batched bench (#6500) 2024-04-05 21:34:53 +03:00
batched.swift ggml : add numa options (#5377) 2024-02-16 11:31:07 +02:00
beam-search ggml : add numa options (#5377) 2024-02-16 11:31:07 +02:00
benchmark ggml : remove old quantization functions (#5942) 2024-03-09 15:53:59 +02:00
convert-llama2c-to-ggml llama2c : open file as binary (#6332) 2024-03-27 09:16:02 +02:00
embedding BERT tokenizer fixes (#6498) 2024-04-09 13:44:08 -04:00
eval-callback model: support arch `DbrxForCausalLM` (#6515) 2024-04-13 11:33:52 +02:00
export-lora ci : add an option to fail on compile warning (#3952) 2024-02-17 23:03:14 +02:00
finetune code : normalize enum names (#5697) 2024-02-25 12:09:09 +02:00
gbnf-validator grammars: 1.5x faster inference w/ complex grammars (vector reserves / reuses) (#6609) 2024-04-11 19:47:34 +01:00
gguf gguf : add option to not check tensor data (#6582) 2024-04-10 21:16:48 +03:00
gguf-split Fix --split-max-size (#6655) 2024-04-14 13:12:59 +02:00
gritlm gritlm : add --outdir option to hf.sh script (#6699) 2024-04-16 09:34:06 +03:00
imatrix imatrix : remove invalid assert (#6632) 2024-04-12 11:49:58 +03:00
infill infill : add download instructions for model (#6626) 2024-04-12 15:11:46 +03:00
jeopardy parallel : add option to load external prompt file (#3416) 2023-10-06 16:16:38 +03:00
llama-bench ggml : add llamafile sgemm (#6414) 2024-04-16 21:55:30 +03:00
llama.android android : fix utf8 decoding error (#5935) 2024-03-10 22:03:17 +02:00
llama.swiftui llama : add pipeline parallelism support (#6017) 2024-03-13 18:54:21 +01:00
llava chore: Fix markdown warnings (#6625) 2024-04-12 10:52:36 +02:00
lookahead BERT tokenizer fixes (#6498) 2024-04-09 13:44:08 -04:00
lookup BERT tokenizer fixes (#6498) 2024-04-09 13:44:08 -04:00
main `main`: add --json-schema / -j flag (#6659) 2024-04-15 18:35:21 +01:00
main-cmake-pkg cuda : rename build flag to LLAMA_CUDA (#6299) 2024-03-26 01:16:01 +01:00
parallel llama : greatly reduce output buffer memory usage (#6122) 2024-03-26 16:46:41 +02:00
passkey llama : fix defrag bugs + add parameter (#5735) 2024-02-27 14:35:51 +02:00
perplexity perplexity : require positive --ctx-size arg (#6695) 2024-04-16 09:28:33 +03:00
quantize chore: Fix markdown warnings (#6625) 2024-04-12 10:52:36 +02:00
quantize-stats refactor : switch to emplace_back to avoid extra object (#5291) 2024-02-03 13:23:37 +02:00
retrieval examples : add "retrieval" (#6193) 2024-03-25 09:38:22 +02:00
save-load-state llama : save and restore kv cache for single seq id (#6341) 2024-04-08 15:43:30 +03:00
server Update server.cpp example with correct startup sequence 2024-04-17 18:55:09 +02:00
simple ggml : add numa options (#5377) 2024-02-16 11:31:07 +02:00
speculative BERT tokenizer fixes (#6498) 2024-04-09 13:44:08 -04:00
sycl fix memcpy() crash, add missed cmd in guide, fix softmax (#6622) 2024-04-14 10:42:29 +08:00
tokenize BERT tokenizer fixes (#6498) 2024-04-09 13:44:08 -04:00
train-text-from-scratch gguf : fix resource leaks (#6061) 2024-03-14 20:29:32 +02:00
CMakeLists.txt eval-callback: Example how to use eval callback for debugging (#6576) 2024-04-11 14:51:07 +02:00
Miku.sh MIKU MAYHEM: Upgrading the Default Model for Maximum Fun 🎉 (#2287) 2023-07-21 11:13:18 +03:00
alpaca.sh alpaca.sh : update model file name (#2074) 2023-07-06 19:17:50 +03:00
base-translate.sh examples : improve base-translate.sh script (#4783) 2024-01-06 11:40:24 +02:00
chat-13B.bat Create chat-13B.bat (#592) 2023-03-29 20:21:09 +03:00
chat-13B.sh examples : read chat prompts from a template file (#1196) 2023-05-03 20:58:11 +03:00
chat-persistent.sh llama : fix session saving/loading (#3400) 2023-10-03 21:04:01 +03:00
chat-vicuna.sh examples : add chat-vicuna.sh (#1854) 2023-06-15 21:05:53 +03:00
chat.sh main : log file (#2748) 2023-08-30 09:29:32 +03:00
gpt4all.sh examples : add -n to alpaca and gpt4all scripts (#706) 2023-04-13 16:03:39 +03:00
json-schema-pydantic-example.py json-schema-to-grammar improvements (+ added to server) (#5978) 2024-03-21 11:50:43 +00:00
json_schema_to_grammar.py JSON schema conversion: ️ faster repetitions, min/maxLength for strings, cap number length (#6555) 2024-04-12 19:43:38 +01:00
llama.vim llama.vim : added api key support (#5090) 2024-01-23 08:51:27 +02:00
llama2-13b.sh gitignore : changes for Poetry users + chat examples (#2284) 2023-07-21 13:53:27 +03:00
llama2.sh gitignore : changes for Poetry users + chat examples (#2284) 2023-07-21 13:53:27 +03:00
llm.vim llm.vim : stop generation at multiple linebreaks, bind to <F2> (#2879) 2023-08-30 09:50:55 +03:00
make-ggml.py make-ggml.py : compatibility with more models and GGUF (#3290) 2023-09-27 19:25:12 +03:00
pydantic-models-to-grammar-examples.py examples : make pydantic scripts pass mypy and support py3.8 (#5099) 2024-01-25 14:51:24 -05:00
pydantic_models_to_grammar.py examples : make pydantic scripts pass mypy and support py3.8 (#5099) 2024-01-25 14:51:24 -05:00
reason-act.sh chmod : make scripts executable (#2675) 2023-08-23 17:29:09 +03:00
regex-to-grammar.py JSON schema conversion: ️ faster repetitions, min/maxLength for strings, cap number length (#6555) 2024-04-12 19:43:38 +01:00
server-embd.py server : refactor (#5882) 2024-03-07 11:41:53 +02:00
server-llama2-13B.sh chmod : make scripts executable (#2675) 2023-08-23 17:29:09 +03:00
ts-type-to-grammar.sh JSON schema conversion: ️ faster repetitions, min/maxLength for strings, cap number length (#6555) 2024-04-12 19:43:38 +01:00