llama.cpp

Commit Graph

Author	SHA1	Message	Date
teleprint-me	cd00be886f	chore: Add model metadata	2024-05-22 19:59:13 -04:00
teleprint-me	1957ca41f2	refactor: Simplify BPE pre-tokenizer mapping	2024-05-22 16:57:29 -04:00
teleprint-me	12285b5325	chore: Map model file and vocab types	2024-05-22 02:58:12 -04:00
teleprint-me	0b43e14030	refactor: Add experimental mapping for BPE pre-tokenizers	2024-05-21 22:45:45 -04:00
teleprint-me	34e14ae96d	refactor: Add experimental model mappings	2024-05-21 19:11:51 -04:00
teleprint-me	b2aac685d5	docs: Fix comment	2024-05-21 16:07:12 -04:00
teleprint-me	83b9fcd3e4	refactor: Rename constants to reduce confusion between references	2024-05-21 16:06:39 -04:00
teleprint-me	2fe28ad4d3	chore: Rename from repo to model repo and reorder for improved readability	2024-05-21 01:41:35 -04:00
teleprint-me	4768650aff	chore: Add formatting, set common vocab files, apply pattern to model map	2024-05-21 01:38:29 -04:00
teleprint-me	fb32f50834	feat: Add hf model mapping descriptors for each repo	2024-05-21 01:07:13 -04:00
teleprint-me	a35b76755f	Merge branch 'master' into auto-model-support	2024-05-21 00:16:34 -04:00
teleprint-me	aed0573f68	proto: Add experimental vocab pre-tokenizer regular expressions	2024-05-21 00:14:26 -04:00
teleprint-me	5978bb007d	chore: Fix and update comments	2024-05-20 14:59:40 -04:00
teleprint-me	2fa2c7a86c	chore: Move enums and model map to constants	2024-05-20 14:51:03 -04:00
teleprint-me	d9ba963cd4	refactor: Restructure tokenizer model metadata	2024-05-20 14:42:59 -04:00
teleprint-me	18bb36e496	chore: Allow the user to config the logger	2024-05-20 14:06:21 -04:00
Georgi Gerganov	fabf30b4c4	llama : remove Persimmon (#7408 ) * llama : remove Persimmon * requirements : remove	2024-05-21 02:35:28 +10:00
teleprint-me	bdd0286bd0	refactor: Use proper names for referenced member variables	2024-05-20 01:39:09 -04:00
teleprint-me	a1951e27dc	refactor: Add proper names for remote model references	2024-05-20 01:36:44 -04:00
teleprint-me	6fc4492b3f	chore: Add english pangram to vocab tests	2024-05-20 00:51:35 -04:00
teleprint-me	381dad5eb3	fix: Add missing model architectures	2024-05-20 00:50:42 -04:00
teleprint-me	9a2834e24e	fix: Use __name__ as logger name	2024-05-19 22:39:30 -04:00
teleprint-me	a0362ea475	patch: Fix nested quotes for dict refs	2024-05-19 22:39:05 -04:00
teleprint-me	89a46fe818	feat: Attempt to mirror the llama.cpp API for compatibility	2024-05-19 22:31:05 -04:00
teleprint-me	316b404d94	patch: Fix CLI option for generating vocab tests	2024-05-18 23:59:22 -04:00
teleprint-me	da5deebda1	fix: Apply fix to verbose help description and generating vocab tests option	2024-05-18 23:34:33 -04:00
teleprint-me	bd32266c87	feat: Add function for generating vocab script and fix CLI opts	2024-05-18 22:14:58 -04:00
teleprint-me	0479e9695f	patch: Add exception handling for non-existent vocab related files	2024-05-18 22:14:19 -04:00
teleprint-me	1a82573126	feat: Add example script for automating generating tokenizer model checksums and tests	2024-05-18 20:49:22 -04:00
teleprint-me	006bb60d27	chore: Fix model path references	2024-05-18 19:20:19 -04:00
teleprint-me	b6f70b8a0e	chore: Fix line spacing	2024-05-18 16:59:20 -04:00
teleprint-me	832b449cbd	feat: Add pre-tokenizer CLI tooling	2024-05-18 14:33:56 -04:00
teleprint-me	04fb7886c5	chore: Apply isort to package gguf init	2024-05-18 14:33:22 -04:00
teleprint-me	2ef73ee6e4	refactor: Apply SoC for HF requests, vocab, and weights	2024-05-18 13:45:21 -04:00
teleprint-me	5eda2c9485	feat: Add pre-tokenizer logging	2024-05-18 13:21:22 -04:00
teleprint-me	b2ca23c746	feat: Add method for generating the checksums and writing the results to a json file	2024-05-18 01:46:13 -04:00
teleprint-me	302258721b	refactor: Apply model schema to tokenizer downloads - Add imports for json and hashlib - Add missing models: phi, stablelm, mistral, and mixtral - Fix constructor logic - Fix how models are accessed - Apply model schema to download_model method	2024-05-18 01:26:39 -04:00
teleprint-me	f7515abf49	feat: Add tokenizer types, model types, and model repos	2024-05-18 00:37:19 -04:00
teleprint-me	3ba01c7a0e	chore: Fix spacing	2024-05-18 00:10:42 -04:00
teleprint-me	1a286c8e21	refactor: Clean up variable names and separate concerns when downloading tokenizers	2024-05-17 23:27:30 -04:00
teleprint-me	5c8144e645	feat: Add download_model method and fix references for clarity to mitigate confusion	2024-05-17 23:00:12 -04:00
teleprint-me	4790f76740	feat: Add prototype for requesting vocab related files	2024-05-17 21:08:39 -04:00
teleprint-me	98cf788990	patch: Apply minor fixes for handling headers and writing content	2024-05-17 21:07:51 -04:00
teleprint-me	742abebb39	refactor: Add log for status and fix url path variable name	2024-05-17 20:37:59 -04:00
teleprint-me	ba13d64bb3	feat: Add utils for logging and writing when interacting with HuggingFaceHub	2024-05-17 20:26:21 -04:00
teleprint-me	dbdf6c2b1d	feat: Add prototype for managing huggingface hub content	2024-05-17 20:00:48 -04:00
compilade	ee52225067	convert-hf : support direct Q8_0 conversion (#7234 ) * convert-hf : support q8_0 conversion * convert-hf : add missing ftype This was messing with the checksums otherwise. * convert-hf : add missing ftype to Baichuan and Xverse I didn't notice these on my first pass.	2024-05-13 14:10:51 -04:00
compilade	5a419926b0	convert-hf : support bfloat16 conversion (#7158 ) * convert-hf : support bfloat16 conversion * gguf-py : flake8 fixes * convert-hf : add missing space after comma * convert-hf : get bit-exact same output as ./quantize The quantization version was missing. * convert-hf : don't round bf16 NANs * convert-hf : save some memory with np.int16 intermediate bf16 weights * convert-hf : more closely match llama.cpp with which weights to keep in f32 * convert-hf : add --outtype auto-f16 A reason for this to exist is for model quantizers who want an initial GGUF with the most fidelity to the original model while still using a 16-bit float type instead of 32-bit floats. * convert-hf : remove a semicolon because flake8 doesn't like it It's a reflex from when programming in C/C++, I guess. * convert-hf : support outtype templating in outfile name * convert-hf : rename --outtype auto-f16 to --outtype auto	2024-05-11 11:06:26 -04:00
Joan Fontanals	b83cc3f5b3	llama : add Jina Embeddings architecture (#6826 ) * feat: first things to do * feat: create tensors for Jina architecture * fix: use other tensors * feat: embedding gets results * fix: fix usage of ALIBI * fix: clean prints * fix: do some cleanup unused vars * fix: revert changes to Makefile and CMakeLists * fix: revert some changes * fix: fix small detail * fix: fix convert formatting * fix: fix linting and editor * feat: set proper vocab settings * fix: JinaBertForMaskedLM registration * feat: support q_normalization and k_normalization in Jina arch * feat: handle gpt2 tokenizer with Jina architecture * feat: example comments in embedding * feat: rename Jina Bert to Jina Bert V2 * fix: add some changes as per review * feat: proper KQ_pos for Jina embeddings * feat: add capacity to load models ES and DE for Spanish * llama : fix pre-tokenizers * ggml : full ALiBi support * ggml : update ggml_soft_max_ext() CUDA, SYCL * ggml : ggml_flash_attn_ext() support ALiBi (CPU) * ggml : ggml_flash_attn_ext() support ALiBi (Metal) * ggml : fix warning * ggml : ggml_flash_attn_ext() support ALiBi (CUDA) ggml-ci * minor : clean-up * embedding : add warning about missing SEP --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-05-11 10:46:09 +03:00
Georgi Gerganov	9cb317f77e	ggml : full ALiBi support (#7192 ) * ggml : full ALiBi support * ggml : update ggml_soft_max_ext() CUDA, SYCL * ggml : ggml_flash_attn_ext() support ALiBi (CPU) * ggml : ggml_flash_attn_ext() support ALiBi (Metal) * ggml : fix warning * ggml : ggml_flash_attn_ext() support ALiBi (CUDA) ggml-ci * ggml : fix assert message * vulkan : add dev notes * ggml : require mask when using ALiBi ggml-ci * convert : fix convert for refact models	2024-05-11 10:32:41 +03:00

1 2 3

136 Commits