llama.cpp/examples/quantize
z5269887 79bbf42495 Add test script 2024-04-18 22:21:05 +08:00
..
CMakeLists.txt build : link against build info instead of compiling against it (#3879) 2023-11-02 08:50:16 +02:00
README.md chore: Fix markdown warnings (#6625) 2024-04-12 10:52:36 +02:00
quantize.cpp Implement '--keep-split' to quantize model into several shards 2024-04-15 22:26:29 +08:00
test.sh Add test script 2024-04-18 22:21:05 +08:00

README.md

quantize

TODO

Llama 2 7B

Quantization Bits per Weight (BPW)
Q2_K 3.35
Q3_K_S 3.50
Q3_K_M 3.91
Q3_K_L 4.27
Q4_K_S 4.58
Q4_K_M 4.84
Q5_K_S 5.52
Q5_K_M 5.68
Q6_K 6.56

Llama 2 13B

Quantization Bits per Weight (BPW)
Q2_K 3.34
Q3_K_S 3.48
Q3_K_M 3.89
Q3_K_L 4.26
Q4_K_S 4.56
Q4_K_M 4.83
Q5_K_S 5.51
Q5_K_M 5.67
Q6_K 6.56

Llama 2 70B

Quantization Bits per Weight (BPW)
Q2_K 3.40
Q3_K_S 3.47
Q3_K_M 3.85
Q3_K_L 4.19
Q4_K_S 4.53
Q4_K_M 4.80
Q5_K_S 5.50
Q5_K_M 5.65
Q6_K 6.56