llama.cpp

History

HanishKVC 6597fafeae SimpleChat: Make vertical layout better responsive (flex based) Also needed to make things cleaner and properly usable whether landscape or portrait, after changing to multiline textarea rather than single line user input. Avoid hardcoding the chat-till-now display area height, instead make it a flex-growable within a flex column of ui elements within a fixed vertical area.		2024-05-20 16:31:13 +05:30
..
baby-llama	code : normalize enum names (#5697 )	2024-02-25 12:09:09 +02:00
batched	llama : support Llama 3 HF conversion (#6745 )	2024-04-21 14:50:41 +03:00
batched-bench	ggml : add Flash Attention (#5021 )	2024-04-30 12:16:08 +03:00
batched.swift	llama : add option to render special/control tokens (#6807 )	2024-04-21 18:36:45 +03:00
beam-search	llama : support Llama 3 HF conversion (#6745 )	2024-04-21 14:50:41 +03:00
benchmark	ggml : remove old quantization functions (#5942 )	2024-03-09 15:53:59 +02:00
convert-llama2c-to-ggml	TypoFix (#7162 )	2024-05-09 10:16:45 +02:00
embedding	embedding : free the batch after execution (#7297 )	2024-05-15 15:01:12 +03:00
eval-callback	eval-callback : fix conversion to float (#7184 )	2024-05-10 01:04:12 +02:00
export-lora	ci : add an option to fail on compile warning (#3952 )	2024-02-17 23:03:14 +02:00
finetune	ggml : introduce bfloat16 support (#6412 )	2024-05-08 09:30:09 +03:00
gbnf-validator	grammars: 1.5x faster inference w/ complex grammars (vector reserves / reuses) (#6609 )	2024-04-11 19:47:34 +01:00
gguf	gguf : add option to not check tensor data (#6582 )	2024-04-10 21:16:48 +03:00
gguf-split	gguf-split: add --no-tensor-first-split (#7072 )	2024-05-04 18:56:22 +02:00
gritlm	gritlm : add --outdir option to hf.sh script (#6699 )	2024-04-16 09:34:06 +03:00
imatrix	Fixed save_imatrix to match old behaviour for MoE (#7099 )	2024-05-08 02:24:16 +02:00
infill	llama : support Llama 3 HF conversion (#6745 )	2024-04-21 14:50:41 +03:00
jeopardy	parallel : add option to load external prompt file (#3416 )	2023-10-06 16:16:38 +03:00
llama-bench	llama-bench : add pp+tg test type (#7199 )	2024-05-10 18:03:54 +02:00
llama.android	Revert "move ndk code to a new library (#6951 )" (#7282 )	2024-05-14 16:10:39 +03:00
llama.swiftui	llama : add option to render special/control tokens (#6807 )	2024-04-21 18:36:45 +03:00
llava	ggml : tag ggml_tensor::backend as deprecated (#7290 )	2024-05-15 15:08:48 +02:00
lookahead	llama : support Llama 3 HF conversion (#6745 )	2024-04-21 14:50:41 +03:00
lookup	Server: fix seed for multiple slots (#6835 )	2024-04-24 11:08:36 +02:00
main	Fix memory bug in grammar parser (#7194 )	2024-05-10 21:01:08 +10:00
main-cmake-pkg	build(cmake): simplify instructions (`cmake -B build && cmake --build build ...`) (#6964 )	2024-04-29 17:02:45 +01:00
parallel	llama : support Llama 3 HF conversion (#6745 )	2024-04-21 14:50:41 +03:00
passkey	llama : support Llama 3 HF conversion (#6745 )	2024-04-21 14:50:41 +03:00
perplexity	perplexity: add BF16 vs. FP16 results (#7150 )	2024-05-13 13:03:27 +02:00
quantize	doc: add references to hugging face GGUF-my-repo quantisation web tool. (#7288 )	2024-05-16 15:38:43 +10:00
quantize-stats	Improve usability of --model-url & related flags (#6930 )	2024-04-30 00:52:50 +01:00
retrieval	examples : add "retrieval" (#6193 )	2024-03-25 09:38:22 +02:00
rpc	rpc : set SO_REUSEADDR for the server socket (#7320 )	2024-05-17 17:25:44 +03:00
save-load-state	llama : save and restore kv cache for single seq id (#6341 )	2024-04-08 15:43:30 +03:00
server	SimpleChat: Make vertical layout better responsive (flex based)	2024-05-20 16:31:13 +05:30
simple	llama : support Llama 3 HF conversion (#6745 )	2024-04-21 14:50:41 +03:00
speculative	llama : support Llama 3 HF conversion (#6745 )	2024-04-21 14:50:41 +03:00
sycl	docs: fix typos (#7124 )	2024-05-07 18:20:33 +03:00
tokenize	BERT tokenizer fixes (#6498 )	2024-04-09 13:44:08 -04:00
train-text-from-scratch	train : add general name (#6752 )	2024-04-19 10:16:45 +03:00
CMakeLists.txt	ggml : add RPC backend (#6829 )	2024-05-14 14:27:19 +03:00
Miku.sh	…
alpaca.sh	…
base-translate.sh	examples : improve base-translate.sh script (#4783 )	2024-01-06 11:40:24 +02:00
chat-13B.bat	…
chat-13B.sh	…
chat-persistent.sh	llama : fix session saving/loading (#3400 )	2023-10-03 21:04:01 +03:00
chat-vicuna.sh	…
chat.sh	…
gpt4all.sh	…
json-schema-pydantic-example.py	json-schema-to-grammar improvements (+ added to server) (#5978 )	2024-03-21 11:50:43 +00:00
json_schema_to_grammar.py	JSON schema conversion: ⚡️ faster repetitions, min/maxLength for strings, cap number length (#6555 )	2024-04-12 19:43:38 +01:00
llama.vim	llama.vim : added api key support (#5090 )	2024-01-23 08:51:27 +02:00
llama2-13b.sh	…
llama2.sh	…
llm.vim	…
make-ggml.py	make-ggml.py : compatibility with more models and GGUF (#3290 )	2023-09-27 19:25:12 +03:00
pydantic-models-to-grammar-examples.py	examples : make pydantic scripts pass mypy and support py3.8 (#5099 )	2024-01-25 14:51:24 -05:00
pydantic_models_to_grammar.py	examples : make pydantic scripts pass mypy and support py3.8 (#5099 )	2024-01-25 14:51:24 -05:00
reason-act.sh	…
regex-to-grammar.py	JSON schema conversion: ⚡️ faster repetitions, min/maxLength for strings, cap number length (#6555 )	2024-04-12 19:43:38 +01:00
server-embd.py	server : refactor (#5882 )	2024-03-07 11:41:53 +02:00
server-llama2-13B.sh	…
ts-type-to-grammar.sh	JSON schema conversion: ⚡️ faster repetitions, min/maxLength for strings, cap number length (#6555 )	2024-04-12 19:43:38 +01:00