Heiner
abc958b07e
Move noqa comment to where the lastest flake8 likes it.
2024-05-25 13:27:43 +02:00
Heiner
0a1ef1127f
Write tensors in layer order.
2024-05-25 13:27:43 +02:00
Heiner
60b29ea6e4
More constants from gguf.
2024-05-25 13:27:43 +02:00
Heiner
e2f13a3346
Use Q8_0 quantization from gguf module.
...
This makes tensors exactly as in https://huggingface.co/Arki05/Grok-1-GGUF/tree/main/Q8_0
2024-05-25 13:27:43 +02:00
Heiner
f177b6596c
Fix layer order.
2024-05-25 13:27:43 +02:00
Heiner
9a0629d545
Don't multiply embeddings with embedding_multiplier_scale as it happens in llama.cpp.
2024-05-25 13:27:43 +02:00
Heiner
ef671c693d
Address review comments by foldl.
2024-05-25 13:27:43 +02:00
Heiner
d894497a96
Move print to logging: Fixes.
2024-05-25 13:27:43 +02:00
Brian
5bc4f10ee9
Update convert_grok.py to use logging module
2024-05-25 13:27:43 +02:00
Heiner
08427630c3
Use only one list of weight names, with values from the gguf module.
...
This saves weights in the order in which they are in the Grok-1 files.
Since we operate weight-by-weight now, we no longer need caches and
name2key translations.
Per reviewer request, I also moved to using keys in gguf.TENSOR_NAMES.
2024-05-25 13:27:43 +02:00
Heiner
3c57743874
Don't split MoE weights.
...
As per https://github.com/ggerganov/llama.cpp/pull/7058#issuecomment-2092967508 .
This helps avoid a memcopy when running.
2024-05-25 13:27:43 +02:00
Heiner
6ddf93b286
Script to convert Grok-1 weights from raw JAX pickle files.
2024-05-25 13:27:43 +02:00