1.5 KiB
1.5 KiB
IMPORTANT: Ensure you've thoroughly reviewed the AGENTS.md file before beginning any work.
AI Policy
- AI is assistive only; AI-generated PRs are restricted per AGENTS.md
- Contributor reviews and writes code themselves
Code Style & Conventions
- snake_case naming; optimize for longest common prefix
- 4 spaces indentation, brackets on same line
void * ptr,int & a(space around pointer/reference)- Avoid templates and fancy STL
- Use sized integer types (
int32_t) in public API - See CONTRIBUTING.md for full guidelines, naming, and PR process
ggml Tensor Conventions
- Data stored in row-major order
- Dimension 0 = columns, dimension 1 = rows, dimension 2 = matrices
- Matrix multiply is unconventional:
C = ggml_mul_mat(ctx, A, B)meansC^T = A * B^T
Quantization
- See docs/quantization/ for comprehensive documentation
- See docs/quantization/09-adding-new-types.md for adding new types
Key Files
ggml/include/ggml.h: type enums (ggml_type)ggml/src/ggml-common.h: block structuresggml/src/ggml-quants.c: reference quantize/dequantize implementationstools/quantize/quantize.cpp: CLI toolsrc/llama-quant.cpp: core quantization engine
Quantization Families
- Q: simple uniform quantization
- K: super-block quantization (multiple sub-blocks per super-block)
- IQ: importance-weighted quantization
- T: ternary quantization
- MXFP: Microsoft floating-point quantization