llama.cpp

Author	SHA1	Message	Date
Heiner	abc958b07e	Move noqa comment to where the lastest flake8 likes it.	2024-05-25 13:27:43 +02:00
Heiner	0a1ef1127f	Write tensors in layer order.	2024-05-25 13:27:43 +02:00
Heiner	60b29ea6e4	More constants from gguf.	2024-05-25 13:27:43 +02:00
Heiner	e2f13a3346	Use Q8_0 quantization from gguf module. This makes tensors exactly as in https://huggingface.co/Arki05/Grok-1-GGUF/tree/main/Q8_0	2024-05-25 13:27:43 +02:00
Heiner	f177b6596c	Fix layer order.	2024-05-25 13:27:43 +02:00
Heiner	9a0629d545	Don't multiply embeddings with embedding_multiplier_scale as it happens in llama.cpp.	2024-05-25 13:27:43 +02:00
Heiner	ef671c693d	Address review comments by foldl.	2024-05-25 13:27:43 +02:00
Heiner	d894497a96	Move print to logging: Fixes.	2024-05-25 13:27:43 +02:00
Brian	5bc4f10ee9	Update convert_grok.py to use logging module	2024-05-25 13:27:43 +02:00
Heiner	08427630c3	Use only one list of weight names, with values from the gguf module. This saves weights in the order in which they are in the Grok-1 files. Since we operate weight-by-weight now, we no longer need caches and name2key translations. Per reviewer request, I also moved to using keys in gguf.TENSOR_NAMES.	2024-05-25 13:27:43 +02:00
Heiner	3c57743874	Don't split MoE weights. As per https://github.com/ggerganov/llama.cpp/pull/7058#issuecomment-2092967508. This helps avoid a memcopy when running.	2024-05-25 13:27:43 +02:00
Heiner	6ddf93b286	Script to convert Grok-1 weights from raw JAX pickle files.	2024-05-25 13:27:43 +02:00