llama.cpp

ManniX-ITA d9157cdf34 Update server.cpp example with correct startup sequence The HTTP listener start and the health API endpoint are moved before the model loading starts, hence the server can correctly report is loading the model	2024-04-17 18:55:09 +02:00
..
baby-llama	code : normalize enum names (#5697 )	2024-02-25 12:09:09 +02:00
batched	metal : pad n_ctx by 32 (#6177 )	2024-03-22 09:36:03 +02:00
batched-bench	bench : make n_batch and n_ubatch configurable in Batched bench (#6500 )	2024-04-05 21:34:53 +03:00
batched.swift	ggml : add numa options (#5377 )	2024-02-16 11:31:07 +02:00
beam-search	ggml : add numa options (#5377 )	2024-02-16 11:31:07 +02:00
benchmark	ggml : remove old quantization functions (#5942 )	2024-03-09 15:53:59 +02:00
convert-llama2c-to-ggml	llama2c : open file as binary (#6332 )	2024-03-27 09:16:02 +02:00
embedding	BERT tokenizer fixes (#6498 )	2024-04-09 13:44:08 -04:00
eval-callback	model: support arch `DbrxForCausalLM` (#6515 )	2024-04-13 11:33:52 +02:00
export-lora	ci : add an option to fail on compile warning (#3952 )	2024-02-17 23:03:14 +02:00
finetune	code : normalize enum names (#5697 )	2024-02-25 12:09:09 +02:00
gbnf-validator	grammars: 1.5x faster inference w/ complex grammars (vector reserves / reuses) (#6609 )	2024-04-11 19:47:34 +01:00
gguf	gguf : add option to not check tensor data (#6582 )	2024-04-10 21:16:48 +03:00
gguf-split	Fix --split-max-size (#6655 )	2024-04-14 13:12:59 +02:00
gritlm	gritlm : add --outdir option to hf.sh script (#6699 )	2024-04-16 09:34:06 +03:00
imatrix	imatrix : remove invalid assert (#6632 )	2024-04-12 11:49:58 +03:00
infill	infill : add download instructions for model (#6626 )	2024-04-12 15:11:46 +03:00
jeopardy	parallel : add option to load external prompt file (#3416 )	2023-10-06 16:16:38 +03:00
llama-bench	ggml : add llamafile sgemm (#6414 )	2024-04-16 21:55:30 +03:00
llama.android	android : fix utf8 decoding error (#5935 )	2024-03-10 22:03:17 +02:00
llama.swiftui	llama : add pipeline parallelism support (#6017 )	2024-03-13 18:54:21 +01:00
llava	chore: Fix markdown warnings (#6625 )	2024-04-12 10:52:36 +02:00
lookahead	BERT tokenizer fixes (#6498 )	2024-04-09 13:44:08 -04:00
lookup	BERT tokenizer fixes (#6498 )	2024-04-09 13:44:08 -04:00
main	`main`: add --json-schema / -j flag (#6659 )	2024-04-15 18:35:21 +01:00
main-cmake-pkg	cuda : rename build flag to LLAMA_CUDA (#6299 )	2024-03-26 01:16:01 +01:00
parallel	llama : greatly reduce output buffer memory usage (#6122 )	2024-03-26 16:46:41 +02:00
passkey	llama : fix defrag bugs + add parameter (#5735 )	2024-02-27 14:35:51 +02:00
perplexity	perplexity : require positive --ctx-size arg (#6695 )	2024-04-16 09:28:33 +03:00
quantize	chore: Fix markdown warnings (#6625 )	2024-04-12 10:52:36 +02:00
quantize-stats	refactor : switch to emplace_back to avoid extra object (#5291 )	2024-02-03 13:23:37 +02:00
retrieval	examples : add "retrieval" (#6193 )	2024-03-25 09:38:22 +02:00
save-load-state	llama : save and restore kv cache for single seq id (#6341 )	2024-04-08 15:43:30 +03:00
server	Update server.cpp example with correct startup sequence	2024-04-17 18:55:09 +02:00
simple	ggml : add numa options (#5377 )	2024-02-16 11:31:07 +02:00
speculative	BERT tokenizer fixes (#6498 )	2024-04-09 13:44:08 -04:00
sycl	fix memcpy() crash, add missed cmd in guide, fix softmax (#6622 )	2024-04-14 10:42:29 +08:00
tokenize	BERT tokenizer fixes (#6498 )	2024-04-09 13:44:08 -04:00
train-text-from-scratch	gguf : fix resource leaks (#6061 )	2024-03-14 20:29:32 +02:00
CMakeLists.txt	eval-callback: Example how to use eval callback for debugging (#6576 )	2024-04-11 14:51:07 +02:00
Miku.sh	MIKU MAYHEM: Upgrading the Default Model for Maximum Fun 🎉 (#2287 )	2023-07-21 11:13:18 +03:00
alpaca.sh	alpaca.sh : update model file name (#2074 )	2023-07-06 19:17:50 +03:00
base-translate.sh	examples : improve base-translate.sh script (#4783 )	2024-01-06 11:40:24 +02:00
chat-13B.bat	Create chat-13B.bat (#592 )	2023-03-29 20:21:09 +03:00
chat-13B.sh	examples : read chat prompts from a template file (#1196 )	2023-05-03 20:58:11 +03:00
chat-persistent.sh	llama : fix session saving/loading (#3400 )	2023-10-03 21:04:01 +03:00
chat-vicuna.sh	examples : add chat-vicuna.sh (#1854 )	2023-06-15 21:05:53 +03:00
chat.sh	main : log file (#2748 )	2023-08-30 09:29:32 +03:00
gpt4all.sh	examples : add -n to alpaca and gpt4all scripts (#706 )	2023-04-13 16:03:39 +03:00
json-schema-pydantic-example.py	json-schema-to-grammar improvements (+ added to server) (#5978 )	2024-03-21 11:50:43 +00:00
json_schema_to_grammar.py	JSON schema conversion: ⚡️ faster repetitions, min/maxLength for strings, cap number length (#6555 )	2024-04-12 19:43:38 +01:00
llama.vim	llama.vim : added api key support (#5090 )	2024-01-23 08:51:27 +02:00
llama2-13b.sh	gitignore : changes for Poetry users + chat examples (#2284 )	2023-07-21 13:53:27 +03:00
llama2.sh	gitignore : changes for Poetry users + chat examples (#2284 )	2023-07-21 13:53:27 +03:00
llm.vim	llm.vim : stop generation at multiple linebreaks, bind to <F2> (#2879 )	2023-08-30 09:50:55 +03:00
make-ggml.py	make-ggml.py : compatibility with more models and GGUF (#3290 )	2023-09-27 19:25:12 +03:00
pydantic-models-to-grammar-examples.py	examples : make pydantic scripts pass mypy and support py3.8 (#5099 )	2024-01-25 14:51:24 -05:00
pydantic_models_to_grammar.py	examples : make pydantic scripts pass mypy and support py3.8 (#5099 )	2024-01-25 14:51:24 -05:00
reason-act.sh	chmod : make scripts executable (#2675 )	2023-08-23 17:29:09 +03:00
regex-to-grammar.py	JSON schema conversion: ⚡️ faster repetitions, min/maxLength for strings, cap number length (#6555 )	2024-04-12 19:43:38 +01:00
server-embd.py	server : refactor (#5882 )	2024-03-07 11:41:53 +02:00
server-llama2-13B.sh	chmod : make scripts executable (#2675 )	2023-08-23 17:29:09 +03:00
ts-type-to-grammar.sh	JSON schema conversion: ⚡️ faster repetitions, min/maxLength for strings, cap number length (#6555 )	2024-04-12 19:43:38 +01:00