Commit Graph

12 Commits

Author SHA1 Message Date
Heiner abc958b07e Move noqa comment to where the lastest flake8 likes it. 2024-05-25 13:27:43 +02:00
Heiner 0a1ef1127f Write tensors in layer order. 2024-05-25 13:27:43 +02:00
Heiner 60b29ea6e4 More constants from gguf. 2024-05-25 13:27:43 +02:00
Heiner e2f13a3346 Use Q8_0 quantization from gguf module.
This makes tensors exactly as in https://huggingface.co/Arki05/Grok-1-GGUF/tree/main/Q8_0
2024-05-25 13:27:43 +02:00
Heiner f177b6596c Fix layer order. 2024-05-25 13:27:43 +02:00
Heiner 9a0629d545 Don't multiply embeddings with embedding_multiplier_scale as it happens in llama.cpp. 2024-05-25 13:27:43 +02:00
Heiner ef671c693d Address review comments by foldl. 2024-05-25 13:27:43 +02:00
Heiner d894497a96 Move print to logging: Fixes. 2024-05-25 13:27:43 +02:00
Brian 5bc4f10ee9 Update convert_grok.py to use logging module 2024-05-25 13:27:43 +02:00
Heiner 08427630c3 Use only one list of weight names, with values from the gguf module.
This saves weights in the order in which they are in the Grok-1 files.
Since we operate weight-by-weight now, we no longer need caches and
name2key translations.

Per reviewer request, I also moved to using keys in gguf.TENSOR_NAMES.
2024-05-25 13:27:43 +02:00
Heiner 3c57743874 Don't split MoE weights.
As per https://github.com/ggerganov/llama.cpp/pull/7058#issuecomment-2092967508.
This helps avoid a memcopy when running.
2024-05-25 13:27:43 +02:00
Heiner 6ddf93b286 Script to convert Grok-1 weights from raw JAX pickle files. 2024-05-25 13:27:43 +02:00