llama.cpp

Commit Graph

Author	SHA1	Message	Date
teleprint-me	12285b5325	chore: Map model file and vocab types	2024-05-22 02:58:12 -04:00
teleprint-me	0b43e14030	refactor: Add experimental mapping for BPE pre-tokenizers	2024-05-21 22:45:45 -04:00
teleprint-me	34e14ae96d	refactor: Add experimental model mappings	2024-05-21 19:11:51 -04:00
teleprint-me	b2aac685d5	docs: Fix comment	2024-05-21 16:07:12 -04:00
teleprint-me	83b9fcd3e4	refactor: Rename constants to reduce confusion between references	2024-05-21 16:06:39 -04:00
teleprint-me	2fe28ad4d3	chore: Rename from repo to model repo and reorder for improved readability	2024-05-21 01:41:35 -04:00
teleprint-me	4768650aff	chore: Add formatting, set common vocab files, apply pattern to model map	2024-05-21 01:38:29 -04:00
teleprint-me	fb32f50834	feat: Add hf model mapping descriptors for each repo	2024-05-21 01:07:13 -04:00
teleprint-me	a35b76755f	Merge branch 'master' into auto-model-support	2024-05-21 00:16:34 -04:00
teleprint-me	aed0573f68	proto: Add experimental vocab pre-tokenizer regular expressions	2024-05-21 00:14:26 -04:00
teleprint-me	5978bb007d	chore: Fix and update comments	2024-05-20 14:59:40 -04:00
teleprint-me	2fa2c7a86c	chore: Move enums and model map to constants	2024-05-20 14:51:03 -04:00
teleprint-me	d9ba963cd4	refactor: Restructure tokenizer model metadata	2024-05-20 14:42:59 -04:00
teleprint-me	18bb36e496	chore: Allow the user to config the logger	2024-05-20 14:06:21 -04:00
Georgi Gerganov	fabf30b4c4	llama : remove Persimmon (#7408 ) * llama : remove Persimmon * requirements : remove	2024-05-21 02:35:28 +10:00
teleprint-me	bdd0286bd0	refactor: Use proper names for referenced member variables	2024-05-20 01:39:09 -04:00
teleprint-me	a1951e27dc	refactor: Add proper names for remote model references	2024-05-20 01:36:44 -04:00
teleprint-me	6fc4492b3f	chore: Add english pangram to vocab tests	2024-05-20 00:51:35 -04:00
teleprint-me	381dad5eb3	fix: Add missing model architectures	2024-05-20 00:50:42 -04:00
teleprint-me	9a2834e24e	fix: Use __name__ as logger name	2024-05-19 22:39:30 -04:00
teleprint-me	a0362ea475	patch: Fix nested quotes for dict refs	2024-05-19 22:39:05 -04:00
teleprint-me	89a46fe818	feat: Attempt to mirror the llama.cpp API for compatibility	2024-05-19 22:31:05 -04:00
teleprint-me	316b404d94	patch: Fix CLI option for generating vocab tests	2024-05-18 23:59:22 -04:00
teleprint-me	da5deebda1	fix: Apply fix to verbose help description and generating vocab tests option	2024-05-18 23:34:33 -04:00
teleprint-me	bd32266c87	feat: Add function for generating vocab script and fix CLI opts	2024-05-18 22:14:58 -04:00
teleprint-me	0479e9695f	patch: Add exception handling for non-existent vocab related files	2024-05-18 22:14:19 -04:00
teleprint-me	1a82573126	feat: Add example script for automating generating tokenizer model checksums and tests	2024-05-18 20:49:22 -04:00
teleprint-me	006bb60d27	chore: Fix model path references	2024-05-18 19:20:19 -04:00
teleprint-me	b6f70b8a0e	chore: Fix line spacing	2024-05-18 16:59:20 -04:00
teleprint-me	832b449cbd	feat: Add pre-tokenizer CLI tooling	2024-05-18 14:33:56 -04:00
teleprint-me	04fb7886c5	chore: Apply isort to package gguf init	2024-05-18 14:33:22 -04:00
teleprint-me	2ef73ee6e4	refactor: Apply SoC for HF requests, vocab, and weights	2024-05-18 13:45:21 -04:00
teleprint-me	5eda2c9485	feat: Add pre-tokenizer logging	2024-05-18 13:21:22 -04:00
teleprint-me	b2ca23c746	feat: Add method for generating the checksums and writing the results to a json file	2024-05-18 01:46:13 -04:00
teleprint-me	302258721b	refactor: Apply model schema to tokenizer downloads - Add imports for json and hashlib - Add missing models: phi, stablelm, mistral, and mixtral - Fix constructor logic - Fix how models are accessed - Apply model schema to download_model method	2024-05-18 01:26:39 -04:00
teleprint-me	f7515abf49	feat: Add tokenizer types, model types, and model repos	2024-05-18 00:37:19 -04:00
teleprint-me	3ba01c7a0e	chore: Fix spacing	2024-05-18 00:10:42 -04:00
teleprint-me	1a286c8e21	refactor: Clean up variable names and separate concerns when downloading tokenizers	2024-05-17 23:27:30 -04:00
teleprint-me	5c8144e645	feat: Add download_model method and fix references for clarity to mitigate confusion	2024-05-17 23:00:12 -04:00
teleprint-me	4790f76740	feat: Add prototype for requesting vocab related files	2024-05-17 21:08:39 -04:00
teleprint-me	98cf788990	patch: Apply minor fixes for handling headers and writing content	2024-05-17 21:07:51 -04:00
teleprint-me	742abebb39	refactor: Add log for status and fix url path variable name	2024-05-17 20:37:59 -04:00
teleprint-me	ba13d64bb3	feat: Add utils for logging and writing when interacting with HuggingFaceHub	2024-05-17 20:26:21 -04:00
teleprint-me	dbdf6c2b1d	feat: Add prototype for managing huggingface hub content	2024-05-17 20:00:48 -04:00
compilade	ee52225067	convert-hf : support direct Q8_0 conversion (#7234 ) * convert-hf : support q8_0 conversion * convert-hf : add missing ftype This was messing with the checksums otherwise. * convert-hf : add missing ftype to Baichuan and Xverse I didn't notice these on my first pass.	2024-05-13 14:10:51 -04:00
compilade	5a419926b0	convert-hf : support bfloat16 conversion (#7158 ) * convert-hf : support bfloat16 conversion * gguf-py : flake8 fixes * convert-hf : add missing space after comma * convert-hf : get bit-exact same output as ./quantize The quantization version was missing. * convert-hf : don't round bf16 NANs * convert-hf : save some memory with np.int16 intermediate bf16 weights * convert-hf : more closely match llama.cpp with which weights to keep in f32 * convert-hf : add --outtype auto-f16 A reason for this to exist is for model quantizers who want an initial GGUF with the most fidelity to the original model while still using a 16-bit float type instead of 32-bit floats. * convert-hf : remove a semicolon because flake8 doesn't like it It's a reflex from when programming in C/C++, I guess. * convert-hf : support outtype templating in outfile name * convert-hf : rename --outtype auto-f16 to --outtype auto	2024-05-11 11:06:26 -04:00
Joan Fontanals	b83cc3f5b3	llama : add Jina Embeddings architecture (#6826 ) * feat: first things to do * feat: create tensors for Jina architecture * fix: use other tensors * feat: embedding gets results * fix: fix usage of ALIBI * fix: clean prints * fix: do some cleanup unused vars * fix: revert changes to Makefile and CMakeLists * fix: revert some changes * fix: fix small detail * fix: fix convert formatting * fix: fix linting and editor * feat: set proper vocab settings * fix: JinaBertForMaskedLM registration * feat: support q_normalization and k_normalization in Jina arch * feat: handle gpt2 tokenizer with Jina architecture * feat: example comments in embedding * feat: rename Jina Bert to Jina Bert V2 * fix: add some changes as per review * feat: proper KQ_pos for Jina embeddings * feat: add capacity to load models ES and DE for Spanish * llama : fix pre-tokenizers * ggml : full ALiBi support * ggml : update ggml_soft_max_ext() CUDA, SYCL * ggml : ggml_flash_attn_ext() support ALiBi (CPU) * ggml : ggml_flash_attn_ext() support ALiBi (Metal) * ggml : fix warning * ggml : ggml_flash_attn_ext() support ALiBi (CUDA) ggml-ci * minor : clean-up * embedding : add warning about missing SEP --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-05-11 10:46:09 +03:00
Georgi Gerganov	9cb317f77e	ggml : full ALiBi support (#7192 ) * ggml : full ALiBi support * ggml : update ggml_soft_max_ext() CUDA, SYCL * ggml : ggml_flash_attn_ext() support ALiBi (CPU) * ggml : ggml_flash_attn_ext() support ALiBi (Metal) * ggml : fix warning * ggml : ggml_flash_attn_ext() support ALiBi (CUDA) ggml-ci * ggml : fix assert message * vulkan : add dev notes * ggml : require mask when using ALiBi ggml-ci * convert : fix convert for refact models	2024-05-11 10:32:41 +03:00
Sigbjørn Skjæret	22842164bc	gguf-py : add special token modification capability (#7166 ) * Add special token modification capability To be able to fix/amend special tokens in a GGUF let's add two new arguments: * `--special-token <name> <value>` where `<name>` can be bos, eos, prefix, middle, etc. while `<value>` is the token value, f.ex. `"<｜fim▁begin｜>"` * `--special-token-by-id <name> <id>` where `<id>` is the ID of the token, f.ex. 32006 So, in order to f.ex. add fill-in-middle tokens to a GGUF you would do the following: ```bash python3 gguf-new-metadata.py input.gguf output.gguf --special-token prefix "<｜fim▁begin｜>" --special-token middle "<｜fim▁hole｜>" --special-token suffix "<｜fim▁end｜>" ``` * improve help text * flake-- * fix multiple tokens warning * make script executable * switch to namedtuple, no need to dataclass * typing++ * add progress bar * Add special token modification capability To be able to fix/amend special tokens in a GGUF let's add two new arguments: * `--special-token <name> <value>` where `<name>` can be bos, eos, prefix, middle, etc. while `<value>` is the token value, f.ex. `"<｜fim▁begin｜>"` * `--special-token-by-id <name> <id>` where `<id>` is the ID of the token, f.ex. 32006 So, in order to f.ex. add fill-in-middle tokens to a GGUF you would do the following: ```bash gguf-new-metadata.py input.gguf output.gguf --special-token prefix "<｜fim▁begin｜>" --special-token middle "<｜fim▁end｜>" --special-token suffix "<｜fim▁hole｜>" ``` (yes, fim_end is the `middle` token, because completion is a `prefix`/`suffix`/`middle` sequence (where `middle` is unfilled)) or ```bash gguf-new-metadata.py input.gguf output.gguf --special-token prefix "<fim_prefix>" --special-token middle "<fim_middle>" --special-token suffix "<fim_suffix>" ``` etc... NB: The tokens have to exist already, trying to add non-existent token name/IDs will be ignored (with a warning), while non-existent values will fail (with an error). * improve help text * flake-- * fix multiple tokens warning * make script executable * switch to namedtuple, no need to dataclass * typing++ * add progress bar * fail on invalid token id	2024-05-09 13:56:00 +03:00
compilade	f98eb31c51	convert-hf : save memory with lazy evaluation (#7075 ) * convert-hf : begin refactoring write_tensor * convert : upgrade to sentencepiece v0.2.0 * convert-hf : remove unused n_dims in extra__tensors convert-hf : simplify MoE weights stacking * convert-hf : flake8 linter doesn't like semicolons * convert-hf : allow unusual model part names For example, loading `model-00001-of-00001.safetensors` now works. * convert-hf : fix stacking MoE expert tensors `torch.stack` and `torch.cat` don't do the same thing. * convert-hf : fix Mamba conversion Tested to work even with a SentencePiece-based tokenizer. * convert : use a string for the SentencePiece tokenizer path * convert-hf : display tensor shape * convert-hf : convert norms to f32 by default * convert-hf : sort model part names `os.listdir` is said to list files in arbitrary order. Sorting the file names should let "model-00009-of-00042.safetensors" be loaded before "model-00010-of-00042.safetensors". * convert-hf : use an ABC for Model again It seems Protocol can't be used as a statically type-checked ABC, because its subclasses also can't be instantiated. (why did it seem to work?) At least there's still a way to throw an error when forgetting to define the `model_arch` property of any registered Model subclasses. * convert-hf : use a plain class for Model, and forbid direct instantiation There are no abstract methods used anyway, so using ABC isn't really necessary. * convert-hf : more consistent formatting of cmdline args * convert-hf : align the message logged for converted tensors * convert-hf : fix Refact conversion * convert-hf : save memory with lazy evaluation * convert-hf : flake8 doesn't like lowercase L as a variable name * convert-hf : remove einops requirement for InternLM2 * convert-hf : faster model parts loading Instead of pre-loading them all into a dict, iterate on the tensors in the model parts progressively as needed in Model.write_tensors Conversion for some architectures relies on checking for the presence of specific tensor names, so for multi-part models, the weight map is read from the relevant json file to quickly get these names up-front. * convert-hf : minor changes for consistency * gguf-py : add tqdm as a dependency It's small, and used for a progress bar in GGUFWriter.write_tensors_to_file	2024-05-08 18:16:38 -04:00

1 2 3

134 Commits