llama.cpp

Commit Graph

Author	SHA1	Message	Date
teleprint-me	6a725cf2d1	Merge branch 'master' into auto-model-support	2024-05-28 19:19:08 -04:00
teleprint-me	5c92809397	refactor: Apply updates to example script for generating the registry	2024-05-28 19:16:52 -04:00
teleprint-me	f1d067e7a6	refactor: Simplify huggingface hub api and update to reflect changes in constants.py	2024-05-28 19:16:32 -04:00
teleprint-me	9dbc9571a3	refactor: Simplify tokenizers implementation	2024-05-28 18:42:39 -04:00
fairydreaming	ee3dff6b8e	Add support for DeepseekV2ForCausalLM (#7519 ) * common : increase max number of experts to 160 * common : add tensors ATTN_Q_A, ATTN_Q_A_NORM, ATTN_Q_B, ATTN_KV_A_MQA, ATTN_KV_A_NORM, ATTN_KV_B needed by DeepSeek-V2 MLA (multi-head latent attention) architecture * common : add model header parameters: leading_dense_block_count, expert_feed_forward_length, expert_shared_count, expert_weights_scale, attention.q_lora_rank, attention.kv_lora_rank, rope.scaling.yarn_log_multiplier * convert-hf : add model conversion support for DeepseekV2ForCausalLM * llama : add model types for DeepSeek-V2 and DeepSeek-V2-Lite models * llama : add two new llm_build_moe_ffn() arguments: scale_w (whether to scale weights of selected MoE experts) and w_scale (numerical value of the scaling factor) * llama : add inference support for LLM_ARCH_DEEPSEEK2 --------- Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>	2024-05-28 17:07:05 +02:00
teleprint-me	0a478c048a	chore: Add pre tokenizers and include enum mappings	2024-05-27 03:11:40 -04:00
teleprint-me	215394947e	feat: Add prototype for bootstrapping registry	2024-05-27 01:05:36 -04:00
teleprint-me	0732bd9051	feat: Ignore pre-existing model files	2024-05-27 00:06:53 -04:00
teleprint-me	b1c922fec7	feat: Add a proto sketch for handling mode vocab metadata	2024-05-27 00:06:39 -04:00
teleprint-me	7f48eb97db	feat: Add experimental model registry for known models and their related metadata	2024-05-26 23:22:03 -04:00
teleprint-me	36bea177cb	Merge branch 'master' into auto-model-support	2024-05-26 18:07:18 -04:00
teleprint-me	b3a54291cb	Merge branch 'huggingface-hub-api' into auto-model-support	2024-05-25 20:28:40 -04:00
teleprint-me	e4275bcef4	feat: Add example script for downloading models	2024-05-25 19:12:34 -04:00
teleprint-me	fcd20ab9e9	chore: Add comments for each file extension type	2024-05-25 19:12:16 -04:00
teleprint-me	da72554f58	feat: Add static methods for resolving model types and model extensions	2024-05-25 19:11:56 -04:00
teleprint-me	f30bd63252	refactor: Add function for building and parsing CLI arguments	2024-05-25 14:41:13 -04:00
teleprint-me	e9759dee0b	docs: Add revisions to hub-vocab.py module level docstring	2024-05-25 14:33:23 -04:00
teleprint-me	6c1b0111a1	refactor: Apply huggingface_hub api to CLI	2024-05-25 04:16:10 -04:00
teleprint-me	63c3410492	refactor: Add support for model file types	2024-05-25 04:15:39 -04:00
teleprint-me	2ffe6b89c8	Refactor HFubModel and HFHubTokenizer to fix reference issues	2024-05-25 04:15:15 -04:00
teleprint-me	fda2319d7b	refactor: Streamline method signatures and clarify method names related to downloading repo files	2024-05-25 03:32:27 -04:00
teleprint-me	4438d052aa	refactor: Abstract file and logger management to streamline api interface	2024-05-25 02:57:59 -04:00
teleprint-me	99275a1606	refactor: Simplify API and merge HFModel into HFHub	2024-05-25 02:10:52 -04:00
teleprint-me	168297f11c	refactor: Add remote repository listings to the bas HFHub class	2024-05-24 23:57:45 -04:00
teleprint-me	6da2bd6fbc	patch: Apply fix for paths and logging	2024-05-24 21:47:47 -04:00
compilade	b83bab15a5	gguf-py : fix and simplify quantized shape round-trip (#7483 ) * gguf-py : fix and simplify quantized shape round-trip * gguf-py : remove unused import	2024-05-25 11:11:48 +10:00
fairydreaming	fbca2f27fc	Add support for ArcticForCausalLM (#7020 ) * common : increase max number of experts to 128 * common : add tensor LLM_TENSOR_FFN_NORM_EXPS for normalization before MoE that runs in parallel to attention + ffn * gguf-py : add architecture-specific block mappings that override selected general block mappings * convert-hf : add model conversion support for ArcticForCausalLM * convert-hf : use added_tokens_decoder from tokenizer_config.json to redefine tokens from SentencePiece model (only for ArcticForCausalLM) * llama : add inference support for LLM_ARCH_ARCTIC --------- Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>	2024-05-24 14:31:13 +02:00
teleprint-me	64096942ce	refactor: Simplify the huggingface hub api to enable flexible model requests	2024-05-24 02:40:34 -04:00
teleprint-me	6c9ac0fc52	refactor: Add a custom tokenizer component and fix vocab request class	2024-05-24 01:30:29 -04:00
teleprint-me	e62e09bbb1	refactor: Apply fix for file path references	2024-05-23 22:59:16 -04:00
teleprint-me	c91dcdf2a4	refactor: Add fixes for logging	2024-05-23 22:58:03 -04:00
teleprint-me	77bc7394c8	refactor: Add tokenizer path, add methods for extracting vocab metadata, fix checksum method name	2024-05-23 21:40:05 -04:00
teleprint-me	b4b553fe6c	chore: Apply ruff formatting for readability	2024-05-23 21:36:51 -04:00
teleprint-me	ea4fc1095e	refactor: Apply fixes to required arguments and fixes to options	2024-05-23 21:36:31 -04:00
teleprint-me	f62080adfa	refactor: Simplify huggingface hub vocab request	2024-05-23 20:50:58 -04:00
teleprint-me	1749209406	refactor: Simplify huggingface hub api implementation	2024-05-23 20:50:15 -04:00
teleprint-me	c92c6ad480	feat: Add CLI tool for fetching vocab files	2024-05-23 20:33:12 -04:00
teleprint-me	0ccf579242	refactor: Apply consistent naming conventions	2024-05-23 17:17:22 -04:00
teleprint-me	9ba6b92c2d	chore: Add required vocabulary constants	2024-05-23 16:57:14 -04:00
teleprint-me	9814b7f9ab	feat: Add custom huggingface hub api Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com>	2024-05-23 13:48:20 -04:00
Georgi Gerganov	e84b71c2c6	ggml : drop support for QK_K=64 (#7473 ) * ggml : drop support for QK_K=64 ggml-ci * opencl : restore QK_K=256 define	2024-05-23 10:00:21 +03:00
teleprint-me	78d7828cc4	chore: Add prototyped CLI options	2024-05-22 19:59:33 -04:00
teleprint-me	cd00be886f	chore: Add model metadata	2024-05-22 19:59:13 -04:00
teleprint-me	1957ca41f2	refactor: Simplify BPE pre-tokenizer mapping	2024-05-22 16:57:29 -04:00
teleprint-me	12285b5325	chore: Map model file and vocab types	2024-05-22 02:58:12 -04:00
teleprint-me	0b43e14030	refactor: Add experimental mapping for BPE pre-tokenizers	2024-05-21 22:45:45 -04:00
teleprint-me	34e14ae96d	refactor: Add experimental model mappings	2024-05-21 19:11:51 -04:00
liuwei-git	201cc11afa	llama : add phi3 128K model support (#7225 ) * add phi3 128k support in convert-hf-to-gguf * add phi3 128k support in cuda * address build warnings on llama.cpp * adjust index value in cuda long rope freq factors * add long rope support in ggml cpu backend * make freq factors only depend on ctx size * remove unused rope scaling type 'su' frin gguf converter * fix flint warnings on convert-hf-to-gguf.py * set to the short freq factor when context size is small than trained context size * add one line of comments * metal : support rope freq_factors * ggml : update ggml_rope_ext API to support freq. factors * backends : add dev messages to support rope freq. factors * minor : style * tests : update to use new rope API * backends : fix pragma semicolons * minor : cleanup * llama : move rope factors from KV header to tensors * llama : remove tmp assert * cuda : fix compile warning * convert : read/write n_head_kv * llama : fix uninitialized tensors --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-05-21 23:28:32 +03:00
teleprint-me	b2aac685d5	docs: Fix comment	2024-05-21 16:07:12 -04:00
teleprint-me	83b9fcd3e4	refactor: Rename constants to reduce confusion between references	2024-05-21 16:06:39 -04:00

1 2 3 4

179 Commits