llama.cpp

History

HanishKVC 01c8db70f7 ChatON+Main: Add C_API wrapper for single Add a c api wrapper for a single message tagging scenario. Inturn to match convention followed by existing chat_apply_template code, make it return the size expected of the tagged message string buffer. Update internal single logic to help with same. Explicitly check if tmpl specified is available in the loaded json or not and then return a error if not found.		2024-05-06 11:27:56 +05:30
..
baby-llama	code : normalize enum names (#5697 )	2024-02-25 12:09:09 +02:00
batched	llama : support Llama 3 HF conversion (#6745 )	2024-04-21 14:50:41 +03:00
batched-bench	ggml : add Flash Attention (#5021 )	2024-04-30 12:16:08 +03:00
batched.swift	llama : add option to render special/control tokens (#6807 )	2024-04-21 18:36:45 +03:00
beam-search	llama : support Llama 3 HF conversion (#6745 )	2024-04-21 14:50:41 +03:00
benchmark	ggml : remove old quantization functions (#5942 )	2024-03-09 15:53:59 +02:00
convert-llama2c-to-ggml	llama2c : open file as binary (#6332 )	2024-03-27 09:16:02 +02:00
embedding	BERT tokenizer fixes (#6498 )	2024-04-09 13:44:08 -04:00
eval-callback	model: support arch `DbrxForCausalLM` (#6515 )	2024-04-13 11:33:52 +02:00
export-lora	ci : add an option to fail on compile warning (#3952 )	2024-02-17 23:03:14 +02:00
finetune	code : normalize enum names (#5697 )	2024-02-25 12:09:09 +02:00
gbnf-validator	grammars: 1.5x faster inference w/ complex grammars (vector reserves / reuses) (#6609 )	2024-04-11 19:47:34 +01:00
gguf	gguf : add option to not check tensor data (#6582 )	2024-04-10 21:16:48 +03:00
gguf-split	gguf-split: add --no-tensor-first-split (#7072 )	2024-05-04 18:56:22 +02:00
gritlm	gritlm : add --outdir option to hf.sh script (#6699 )	2024-04-16 09:34:06 +03:00
imatrix	quantize: add imatrix and dataset metadata in GGUF (#6658 )	2024-04-26 20:06:33 +02:00
infill	llama : support Llama 3 HF conversion (#6745 )	2024-04-21 14:50:41 +03:00
jeopardy	parallel : add option to load external prompt file (#3416 )	2023-10-06 16:16:38 +03:00
llama-bench	Adding support for the --numa argument for llama-bench. (#7080 )	2024-05-05 14:17:47 +02:00
llama.android	llama : support Llama 3 HF conversion (#6745 )	2024-04-21 14:50:41 +03:00
llama.swiftui	llama : add option to render special/control tokens (#6807 )	2024-04-21 18:36:45 +03:00
llava	llava-cli : multiple images (#6969 )	2024-04-29 17:34:24 +03:00
lookahead	llama : support Llama 3 HF conversion (#6745 )	2024-04-21 14:50:41 +03:00
lookup	Server: fix seed for multiple slots (#6835 )	2024-04-24 11:08:36 +02:00
main	ChatON+Main: Add C_API wrapper for single	2024-05-06 11:27:56 +05:30
main-cmake-pkg	build(cmake): simplify instructions (`cmake -B build && cmake --build build ...`) (#6964 )	2024-04-29 17:02:45 +01:00
parallel	llama : support Llama 3 HF conversion (#6745 )	2024-04-21 14:50:41 +03:00
passkey	llama : support Llama 3 HF conversion (#6745 )	2024-04-21 14:50:41 +03:00
perplexity	perplexity: more statistics, added documentation (#6936 )	2024-04-30 23:36:27 +02:00
quantize	quantize: add imatrix and dataset metadata in GGUF (#6658 )	2024-04-26 20:06:33 +02:00
quantize-stats	Improve usability of --model-url & related flags (#6930 )	2024-04-30 00:52:50 +01:00
retrieval	examples : add "retrieval" (#6193 )	2024-03-25 09:38:22 +02:00
save-load-state	llama : save and restore kv cache for single seq id (#6341 )	2024-04-08 15:43:30 +03:00
server	If first token generated from the server is the stop word the server will crash (#7038 )	2024-05-04 11:06:40 +02:00
simple	llama : support Llama 3 HF conversion (#6745 )	2024-04-21 14:50:41 +03:00
speculative	llama : support Llama 3 HF conversion (#6745 )	2024-04-21 14:50:41 +03:00
sycl	fix memcpy() crash, add missed cmd in guide, fix softmax (#6622 )	2024-04-14 10:42:29 +08:00
tokenize	BERT tokenizer fixes (#6498 )	2024-04-09 13:44:08 -04:00
train-text-from-scratch	train : add general name (#6752 )	2024-04-19 10:16:45 +03:00
CMakeLists.txt	eval-callback: Example how to use eval callback for debugging (#6576 )	2024-04-11 14:51:07 +02:00
Miku.sh	…
alpaca.sh	…
base-translate.sh	examples : improve base-translate.sh script (#4783 )	2024-01-06 11:40:24 +02:00
chat-13B.bat	…
chat-13B.sh	…
chat-persistent.sh	llama : fix session saving/loading (#3400 )	2023-10-03 21:04:01 +03:00
chat-vicuna.sh	…
chat.sh	main : log file (#2748 )	2023-08-30 09:29:32 +03:00
chaton_meta.json	ChatON: Update to new detailed format wrt llama2 and llama3	2024-05-06 11:27:56 +05:30
chaton_meta.old_simple.json	ChatON: Backup the current simple meta json file	2024-05-06 11:27:56 +05:30
gpt4all.sh	…
json-schema-pydantic-example.py	json-schema-to-grammar improvements (+ added to server) (#5978 )	2024-03-21 11:50:43 +00:00
json_schema_to_grammar.py	JSON schema conversion: ⚡️ faster repetitions, min/maxLength for strings, cap number length (#6555 )	2024-04-12 19:43:38 +01:00
llama.vim	llama.vim : added api key support (#5090 )	2024-01-23 08:51:27 +02:00
llama2-13b.sh	gitignore : changes for Poetry users + chat examples (#2284 )	2023-07-21 13:53:27 +03:00
llama2.sh	gitignore : changes for Poetry users + chat examples (#2284 )	2023-07-21 13:53:27 +03:00
llm.vim	llm.vim : stop generation at multiple linebreaks, bind to <F2> (#2879 )	2023-08-30 09:50:55 +03:00
make-ggml.py	make-ggml.py : compatibility with more models and GGUF (#3290 )	2023-09-27 19:25:12 +03:00
pydantic-models-to-grammar-examples.py	examples : make pydantic scripts pass mypy and support py3.8 (#5099 )	2024-01-25 14:51:24 -05:00
pydantic_models_to_grammar.py	examples : make pydantic scripts pass mypy and support py3.8 (#5099 )	2024-01-25 14:51:24 -05:00
reason-act.sh	chmod : make scripts executable (#2675 )	2023-08-23 17:29:09 +03:00
regex-to-grammar.py	JSON schema conversion: ⚡️ faster repetitions, min/maxLength for strings, cap number length (#6555 )	2024-04-12 19:43:38 +01:00
server-embd.py	server : refactor (#5882 )	2024-03-07 11:41:53 +02:00
server-llama2-13B.sh	chmod : make scripts executable (#2675 )	2023-08-23 17:29:09 +03:00
ts-type-to-grammar.sh	JSON schema conversion: ⚡️ faster repetitions, min/maxLength for strings, cap number length (#6555 )	2024-04-12 19:43:38 +01:00