teleprint-me
47ef6157a0
refactor: Add prototyped bridge interface for tokenizers and llama.cpp
2024-05-31 20:35:41 -04:00
teleprint-me
c2e48979e2
Merge branch 'master' into auto-model-support
2024-05-31 14:11:30 -04:00
Galunid
9c4c9cc83f
Move convert.py to examples/convert-legacy-llama.py ( #7430 )
...
* Move convert.py to examples/convert-no-torch.py
* Fix CI, scripts, readme files
* convert-no-torch -> convert-legacy-llama
* Move vocab thing to vocab.py
* Fix convert-no-torch -> convert-legacy-llama
* Fix lost convert.py in ci/run.sh
* Fix imports
* Fix gguf not imported correctly
* Fix flake8 complaints
* Fix check-requirements.sh
* Get rid of ADDED_TOKENS_FILE, FAST_TOKENIZER_FILE
* Review fixes
2024-05-30 21:40:00 +10:00
Galunid
eb57fee51f
gguf-py : Add tokenizer.ggml.pre to gguf-new-metadata.py ( #7627 )
2024-05-30 02:10:40 +02:00
teleprint-me
6a725cf2d1
Merge branch 'master' into auto-model-support
2024-05-28 19:19:08 -04:00
teleprint-me
5c92809397
refactor: Apply updates to example script for generating the registry
2024-05-28 19:16:52 -04:00
teleprint-me
f1d067e7a6
refactor: Simplify huggingface hub api and update to reflect changes in constants.py
2024-05-28 19:16:32 -04:00
teleprint-me
9dbc9571a3
refactor: Simplify tokenizers implementation
2024-05-28 18:42:39 -04:00
fairydreaming
ee3dff6b8e
Add support for DeepseekV2ForCausalLM ( #7519 )
...
* common : increase max number of experts to 160
* common : add tensors ATTN_Q_A, ATTN_Q_A_NORM, ATTN_Q_B, ATTN_KV_A_MQA, ATTN_KV_A_NORM, ATTN_KV_B needed by DeepSeek-V2 MLA (multi-head latent attention) architecture
* common : add model header parameters: leading_dense_block_count, expert_feed_forward_length, expert_shared_count, expert_weights_scale, attention.q_lora_rank, attention.kv_lora_rank, rope.scaling.yarn_log_multiplier
* convert-hf : add model conversion support for DeepseekV2ForCausalLM
* llama : add model types for DeepSeek-V2 and DeepSeek-V2-Lite models
* llama : add two new llm_build_moe_ffn() arguments: scale_w (whether to scale weights of selected MoE experts) and w_scale (numerical value of the scaling factor)
* llama : add inference support for LLM_ARCH_DEEPSEEK2
---------
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
2024-05-28 17:07:05 +02:00
teleprint-me
0a478c048a
chore: Add pre tokenizers and include enum mappings
2024-05-27 03:11:40 -04:00
teleprint-me
215394947e
feat: Add prototype for bootstrapping registry
2024-05-27 01:05:36 -04:00
teleprint-me
0732bd9051
feat: Ignore pre-existing model files
2024-05-27 00:06:53 -04:00
teleprint-me
b1c922fec7
feat: Add a proto sketch for handling mode vocab metadata
2024-05-27 00:06:39 -04:00
teleprint-me
7f48eb97db
feat: Add experimental model registry for known models and their related metadata
2024-05-26 23:22:03 -04:00
teleprint-me
36bea177cb
Merge branch 'master' into auto-model-support
2024-05-26 18:07:18 -04:00
teleprint-me
b3a54291cb
Merge branch 'huggingface-hub-api' into auto-model-support
2024-05-25 20:28:40 -04:00
teleprint-me
e4275bcef4
feat: Add example script for downloading models
2024-05-25 19:12:34 -04:00
teleprint-me
fcd20ab9e9
chore: Add comments for each file extension type
2024-05-25 19:12:16 -04:00
teleprint-me
da72554f58
feat: Add static methods for resolving model types and model extensions
2024-05-25 19:11:56 -04:00
teleprint-me
f30bd63252
refactor: Add function for building and parsing CLI arguments
2024-05-25 14:41:13 -04:00
teleprint-me
e9759dee0b
docs: Add revisions to hub-vocab.py module level docstring
2024-05-25 14:33:23 -04:00
teleprint-me
6c1b0111a1
refactor: Apply huggingface_hub api to CLI
2024-05-25 04:16:10 -04:00
teleprint-me
63c3410492
refactor: Add support for model file types
2024-05-25 04:15:39 -04:00
teleprint-me
2ffe6b89c8
Refactor HFubModel and HFHubTokenizer to fix reference issues
2024-05-25 04:15:15 -04:00
teleprint-me
fda2319d7b
refactor: Streamline method signatures and clarify method names related to downloading repo files
2024-05-25 03:32:27 -04:00
teleprint-me
4438d052aa
refactor: Abstract file and logger management to streamline api interface
2024-05-25 02:57:59 -04:00
teleprint-me
99275a1606
refactor: Simplify API and merge HFModel into HFHub
2024-05-25 02:10:52 -04:00
teleprint-me
168297f11c
refactor: Add remote repository listings to the bas HFHub class
2024-05-24 23:57:45 -04:00
teleprint-me
6da2bd6fbc
patch: Apply fix for paths and logging
2024-05-24 21:47:47 -04:00
compilade
b83bab15a5
gguf-py : fix and simplify quantized shape round-trip ( #7483 )
...
* gguf-py : fix and simplify quantized shape round-trip
* gguf-py : remove unused import
2024-05-25 11:11:48 +10:00
fairydreaming
fbca2f27fc
Add support for ArcticForCausalLM ( #7020 )
...
* common : increase max number of experts to 128
* common : add tensor LLM_TENSOR_FFN_NORM_EXPS for normalization before MoE that runs in parallel to attention + ffn
* gguf-py : add architecture-specific block mappings that override selected general block mappings
* convert-hf : add model conversion support for ArcticForCausalLM
* convert-hf : use added_tokens_decoder from tokenizer_config.json to redefine tokens from SentencePiece model (only for ArcticForCausalLM)
* llama : add inference support for LLM_ARCH_ARCTIC
---------
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
2024-05-24 14:31:13 +02:00
teleprint-me
64096942ce
refactor: Simplify the huggingface hub api to enable flexible model requests
2024-05-24 02:40:34 -04:00
teleprint-me
6c9ac0fc52
refactor: Add a custom tokenizer component and fix vocab request class
2024-05-24 01:30:29 -04:00
teleprint-me
e62e09bbb1
refactor: Apply fix for file path references
2024-05-23 22:59:16 -04:00
teleprint-me
c91dcdf2a4
refactor: Add fixes for logging
2024-05-23 22:58:03 -04:00
teleprint-me
77bc7394c8
refactor: Add tokenizer path, add methods for extracting vocab metadata, fix checksum method name
2024-05-23 21:40:05 -04:00
teleprint-me
b4b553fe6c
chore: Apply ruff formatting for readability
2024-05-23 21:36:51 -04:00
teleprint-me
ea4fc1095e
refactor: Apply fixes to required arguments and fixes to options
2024-05-23 21:36:31 -04:00
teleprint-me
f62080adfa
refactor: Simplify huggingface hub vocab request
2024-05-23 20:50:58 -04:00
teleprint-me
1749209406
refactor: Simplify huggingface hub api implementation
2024-05-23 20:50:15 -04:00
teleprint-me
c92c6ad480
feat: Add CLI tool for fetching vocab files
2024-05-23 20:33:12 -04:00
teleprint-me
0ccf579242
refactor: Apply consistent naming conventions
2024-05-23 17:17:22 -04:00
teleprint-me
9ba6b92c2d
chore: Add required vocabulary constants
2024-05-23 16:57:14 -04:00
teleprint-me
9814b7f9ab
feat: Add custom huggingface hub api
...
Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com>
2024-05-23 13:48:20 -04:00
Georgi Gerganov
e84b71c2c6
ggml : drop support for QK_K=64 ( #7473 )
...
* ggml : drop support for QK_K=64
ggml-ci
* opencl : restore QK_K=256 define
2024-05-23 10:00:21 +03:00
teleprint-me
78d7828cc4
chore: Add prototyped CLI options
2024-05-22 19:59:33 -04:00
teleprint-me
cd00be886f
chore: Add model metadata
2024-05-22 19:59:13 -04:00
teleprint-me
1957ca41f2
refactor: Simplify BPE pre-tokenizer mapping
2024-05-22 16:57:29 -04:00
teleprint-me
12285b5325
chore: Map model file and vocab types
2024-05-22 02:58:12 -04:00
teleprint-me
0b43e14030
refactor: Add experimental mapping for BPE pre-tokenizers
2024-05-21 22:45:45 -04:00