Commit Graph

136 Commits

Author SHA1 Message Date
teleprint-me cd00be886f
chore: Add model metadata 2024-05-22 19:59:13 -04:00
teleprint-me 1957ca41f2
refactor: Simplify BPE pre-tokenizer mapping 2024-05-22 16:57:29 -04:00
teleprint-me 12285b5325
chore: Map model file and vocab types 2024-05-22 02:58:12 -04:00
teleprint-me 0b43e14030
refactor: Add experimental mapping for BPE pre-tokenizers 2024-05-21 22:45:45 -04:00
teleprint-me 34e14ae96d
refactor: Add experimental model mappings 2024-05-21 19:11:51 -04:00
teleprint-me b2aac685d5
docs: Fix comment 2024-05-21 16:07:12 -04:00
teleprint-me 83b9fcd3e4
refactor: Rename constants to reduce confusion between references 2024-05-21 16:06:39 -04:00
teleprint-me 2fe28ad4d3
chore: Rename from repo to model repo and reorder for improved readability 2024-05-21 01:41:35 -04:00
teleprint-me 4768650aff
chore: Add formatting, set common vocab files, apply pattern to model map 2024-05-21 01:38:29 -04:00
teleprint-me fb32f50834
feat: Add hf model mapping descriptors for each repo 2024-05-21 01:07:13 -04:00
teleprint-me a35b76755f
Merge branch 'master' into auto-model-support 2024-05-21 00:16:34 -04:00
teleprint-me aed0573f68
proto: Add experimental vocab pre-tokenizer regular expressions 2024-05-21 00:14:26 -04:00
teleprint-me 5978bb007d
chore: Fix and update comments 2024-05-20 14:59:40 -04:00
teleprint-me 2fa2c7a86c
chore: Move enums and model map to constants 2024-05-20 14:51:03 -04:00
teleprint-me d9ba963cd4
refactor: Restructure tokenizer model metadata 2024-05-20 14:42:59 -04:00
teleprint-me 18bb36e496
chore: Allow the user to config the logger 2024-05-20 14:06:21 -04:00
Georgi Gerganov fabf30b4c4
llama : remove Persimmon (#7408)
* llama : remove Persimmon

* requirements : remove
2024-05-21 02:35:28 +10:00
teleprint-me bdd0286bd0
refactor: Use proper names for referenced member variables 2024-05-20 01:39:09 -04:00
teleprint-me a1951e27dc
refactor: Add proper names for remote model references 2024-05-20 01:36:44 -04:00
teleprint-me 6fc4492b3f
chore: Add english pangram to vocab tests 2024-05-20 00:51:35 -04:00
teleprint-me 381dad5eb3
fix: Add missing model architectures 2024-05-20 00:50:42 -04:00
teleprint-me 9a2834e24e
fix: Use __name__ as logger name 2024-05-19 22:39:30 -04:00
teleprint-me a0362ea475
patch: Fix nested quotes for dict refs 2024-05-19 22:39:05 -04:00
teleprint-me 89a46fe818
feat: Attempt to mirror the llama.cpp API for compatibility 2024-05-19 22:31:05 -04:00
teleprint-me 316b404d94
patch: Fix CLI option for generating vocab tests 2024-05-18 23:59:22 -04:00
teleprint-me da5deebda1
fix: Apply fix to verbose help description and generating vocab tests option 2024-05-18 23:34:33 -04:00
teleprint-me bd32266c87
feat: Add function for generating vocab script and fix CLI opts 2024-05-18 22:14:58 -04:00
teleprint-me 0479e9695f
patch: Add exception handling for non-existent vocab related files 2024-05-18 22:14:19 -04:00
teleprint-me 1a82573126
feat: Add example script for automating generating tokenizer model checksums and tests 2024-05-18 20:49:22 -04:00
teleprint-me 006bb60d27
chore: Fix model path references 2024-05-18 19:20:19 -04:00
teleprint-me b6f70b8a0e
chore: Fix line spacing 2024-05-18 16:59:20 -04:00
teleprint-me 832b449cbd
feat: Add pre-tokenizer CLI tooling 2024-05-18 14:33:56 -04:00
teleprint-me 04fb7886c5
chore: Apply isort to package gguf init 2024-05-18 14:33:22 -04:00
teleprint-me 2ef73ee6e4
refactor: Apply SoC for HF requests, vocab, and weights 2024-05-18 13:45:21 -04:00
teleprint-me 5eda2c9485
feat: Add pre-tokenizer logging 2024-05-18 13:21:22 -04:00
teleprint-me b2ca23c746
feat: Add method for generating the checksums and writing the results to a json file 2024-05-18 01:46:13 -04:00
teleprint-me 302258721b
refactor: Apply model schema to tokenizer downloads
- Add imports for json and hashlib
- Add missing models: phi, stablelm, mistral, and mixtral
- Fix constructor logic
- Fix how models are accessed
- Apply model schema to download_model method
2024-05-18 01:26:39 -04:00
teleprint-me f7515abf49
feat: Add tokenizer types, model types, and model repos 2024-05-18 00:37:19 -04:00
teleprint-me 3ba01c7a0e
chore: Fix spacing 2024-05-18 00:10:42 -04:00
teleprint-me 1a286c8e21
refactor: Clean up variable names and separate concerns when downloading tokenizers 2024-05-17 23:27:30 -04:00
teleprint-me 5c8144e645
feat: Add download_model method and fix references for clarity to mitigate confusion 2024-05-17 23:00:12 -04:00
teleprint-me 4790f76740
feat: Add prototype for requesting vocab related files 2024-05-17 21:08:39 -04:00
teleprint-me 98cf788990
patch: Apply minor fixes for handling headers and writing content 2024-05-17 21:07:51 -04:00
teleprint-me 742abebb39
refactor: Add log for status and fix url path variable name 2024-05-17 20:37:59 -04:00
teleprint-me ba13d64bb3
feat: Add utils for logging and writing when interacting with HuggingFaceHub 2024-05-17 20:26:21 -04:00
teleprint-me dbdf6c2b1d
feat: Add prototype for managing huggingface hub content 2024-05-17 20:00:48 -04:00
compilade ee52225067
convert-hf : support direct Q8_0 conversion (#7234)
* convert-hf : support q8_0 conversion

* convert-hf : add missing ftype

This was messing with the checksums otherwise.

* convert-hf : add missing ftype to Baichuan and Xverse

I didn't notice these on my first pass.
2024-05-13 14:10:51 -04:00
compilade 5a419926b0
convert-hf : support bfloat16 conversion (#7158)
* convert-hf : support bfloat16 conversion

* gguf-py : flake8 fixes

* convert-hf : add missing space after comma

* convert-hf : get bit-exact same output as ./quantize

The quantization version was missing.

* convert-hf : don't round bf16 NANs

* convert-hf : save some memory with np.int16 intermediate bf16 weights

* convert-hf : more closely match llama.cpp with which weights to keep in f32

* convert-hf : add --outtype auto-f16

A reason for this to exist is for model quantizers who want an initial
GGUF with the most fidelity to the original model while still using
a 16-bit float type instead of 32-bit floats.

* convert-hf : remove a semicolon because flake8 doesn't like it

It's a reflex from when programming in C/C++, I guess.

* convert-hf : support outtype templating in outfile name

* convert-hf : rename --outtype auto-f16 to --outtype auto
2024-05-11 11:06:26 -04:00
Joan Fontanals b83cc3f5b3
llama : add Jina Embeddings architecture (#6826)
* feat: first things to do

* feat: create tensors for Jina architecture

* fix: use other tensors

* feat: embedding gets results

* fix: fix usage of ALIBI

* fix: clean prints

* fix: do some cleanup unused vars

* fix: revert changes to Makefile and CMakeLists

* fix: revert some changes

* fix: fix small detail

* fix: fix convert formatting

* fix: fix linting and editor

* feat: set proper vocab settings

* fix: JinaBertForMaskedLM registration

* feat: support q_normalization and k_normalization in Jina arch

* feat: handle gpt2 tokenizer with Jina architecture

* feat: example comments in embedding

* feat: rename Jina Bert to Jina Bert V2

* fix: add some changes as per review

* feat: proper KQ_pos for Jina embeddings

* feat: add capacity to load models ES and DE for Spanish

* llama : fix pre-tokenizers

* ggml : full ALiBi support

* ggml : update ggml_soft_max_ext() CUDA, SYCL

* ggml : ggml_flash_attn_ext() support ALiBi (CPU)

* ggml : ggml_flash_attn_ext() support ALiBi (Metal)

* ggml : fix warning

* ggml : ggml_flash_attn_ext() support ALiBi (CUDA)

ggml-ci

* minor : clean-up

* embedding : add warning about missing SEP

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-05-11 10:46:09 +03:00
Georgi Gerganov 9cb317f77e
ggml : full ALiBi support (#7192)
* ggml : full ALiBi support

* ggml : update ggml_soft_max_ext() CUDA, SYCL

* ggml : ggml_flash_attn_ext() support ALiBi (CPU)

* ggml : ggml_flash_attn_ext() support ALiBi (Metal)

* ggml : fix warning

* ggml : ggml_flash_attn_ext() support ALiBi (CUDA)

ggml-ci

* ggml : fix assert message

* vulkan : add dev notes

* ggml : require mask when using ALiBi

ggml-ci

* convert : fix convert for refact models
2024-05-11 10:32:41 +03:00