Heiner
9a0629d545
Don't multiply embeddings with embedding_multiplier_scale as it happens in llama.cpp.
2024-05-25 13:27:43 +02:00
Heiner
ef671c693d
Address review comments by foldl.
2024-05-25 13:27:43 +02:00
Heiner
d894497a96
Move print to logging: Fixes.
2024-05-25 13:27:43 +02:00
Brian
5bc4f10ee9
Update convert_grok.py to use logging module
2024-05-25 13:27:43 +02:00
Heiner
08427630c3
Use only one list of weight names, with values from the gguf module.
...
This saves weights in the order in which they are in the Grok-1 files.
Since we operate weight-by-weight now, we no longer need caches and
name2key translations.
Per reviewer request, I also moved to using keys in gguf.TENSOR_NAMES.
2024-05-25 13:27:43 +02:00
Heiner
3c57743874
Don't split MoE weights.
...
As per https://github.com/ggerganov/llama.cpp/pull/7058#issuecomment-2092967508 .
This helps avoid a memcopy when running.
2024-05-25 13:27:43 +02:00
Heiner
6ddf93b286
Script to convert Grok-1 weights from raw JAX pickle files.
2024-05-25 13:27:43 +02:00