37 lines
1.5 KiB
Markdown
37 lines
1.5 KiB
Markdown
IMPORTANT: Ensure you've thoroughly reviewed the [AGENTS.md](AGENTS.md) file before beginning any work.
|
|
|
|
# AI Policy
|
|
- AI is assistive only; AI-generated PRs are restricted per AGENTS.md
|
|
- Contributor reviews and writes code themselves
|
|
|
|
# Code Style & Conventions
|
|
- snake_case naming; optimize for longest common prefix
|
|
- 4 spaces indentation, brackets on same line
|
|
- `void * ptr`, `int & a` (space around pointer/reference)
|
|
- Avoid templates and fancy STL
|
|
- Use sized integer types (`int32_t`) in public API
|
|
- See [CONTRIBUTING.md](CONTRIBUTING.md) for full guidelines, naming, and PR process
|
|
|
|
# ggml Tensor Conventions
|
|
- Data stored in row-major order
|
|
- Dimension 0 = columns, dimension 1 = rows, dimension 2 = matrices
|
|
- **Matrix multiply is unconventional**: `C = ggml_mul_mat(ctx, A, B)` means `C^T = A * B^T`
|
|
|
|
# Quantization
|
|
- See [docs/quantization/](docs/quantization/) for comprehensive documentation
|
|
- See [docs/quantization/09-adding-new-types.md](docs/quantization/09-adding-new-types.md) for adding new types
|
|
|
|
## Key Files
|
|
- `ggml/include/ggml.h`: type enums (`ggml_type`)
|
|
- `ggml/src/ggml-common.h`: block structures
|
|
- `ggml/src/ggml-quants.c`: reference quantize/dequantize implementations
|
|
- `tools/quantize/quantize.cpp`: CLI tool
|
|
- `src/llama-quant.cpp`: core quantization engine
|
|
|
|
## Quantization Families
|
|
- **Q**: simple uniform quantization
|
|
- **K**: super-block quantization (multiple sub-blocks per super-block)
|
|
- **IQ**: importance-weighted quantization
|
|
- **T**: ternary quantization
|
|
- **MXFP**: Microsoft floating-point quantization
|