llama.cpp

Author	SHA1	Message	Date
Heiner	9a0629d545	Don't multiply embeddings with embedding_multiplier_scale as it happens in llama.cpp.	2024-05-25 13:27:43 +02:00
Heiner	ef671c693d	Address review comments by foldl.	2024-05-25 13:27:43 +02:00
Heiner	d894497a96	Move print to logging: Fixes.	2024-05-25 13:27:43 +02:00
Brian	5bc4f10ee9	Update convert_grok.py to use logging module	2024-05-25 13:27:43 +02:00
Heiner	08427630c3	Use only one list of weight names, with values from the gguf module. This saves weights in the order in which they are in the Grok-1 files. Since we operate weight-by-weight now, we no longer need caches and name2key translations. Per reviewer request, I also moved to using keys in gguf.TENSOR_NAMES.	2024-05-25 13:27:43 +02:00
Heiner	3c57743874	Don't split MoE weights. As per https://github.com/ggerganov/llama.cpp/pull/7058#issuecomment-2092967508. This helps avoid a memcopy when running.	2024-05-25 13:27:43 +02:00
Heiner	6ddf93b286	Script to convert Grok-1 weights from raw JAX pickle files.	2024-05-25 13:27:43 +02:00