llama.cpp/tests
Tim Burke c919bc471b
cleanup : remove unused untested code and improve consistency
* cleanup: consolidate MXFP type aliases, fix SoA linker bug on 5 platforms

- Add GGML_TYPE_MXFP8 and GGML_TYPE_MXFP6 short aliases (matching
  existing GGML_TYPE_MXFP4 pattern) and use short names consistently
  throughout the codebase instead of mixing long/short forms.

- Fix missing SoA dequant symbols (dequantize_row_mxfp{4,8,6}_soa_cpu)
  on loongarch, powerpc, riscv, s390, and wasm by adding proper aliases
  to each arch section in arch-fallback.h. Previously these were only
  defined under GGML_CPU_GENERIC, causing linker failures on those
  platforms when using MXFP flash attention.

- Remove 10 files from the PR diff:
  - 5 arch stub files replaced by arch-fallback.h aliases
  - 5 rename-only files (sycl, opencl, repack, llama-quant) reverted
    since the GGML_TYPE_MXFP4 compat alias handles them

* cleanup: DRY FP6 unpack, extract mxfp_kv_params + mxfp_dequant_head helper

- FP6 unpack: x86 and ARM SIMD versions now call ggml_mxfp_unpack_fp6x4()
  from ggml-common.h instead of duplicating the scalar bit manipulation.

- Extract mxfp_kv_params sub-struct from mxfp_fa_params: the 7 symmetric
  K/V fields (dequantize, multihead, soa_elems, qs_per_block,
  head_qs_bytes, head_e8m0_offset, blocks_per_head) are now in a reusable
  struct accessed as mxfp.k and mxfp.v.

- Add mxfp_dequant_head() helper: replaces 4 instances of the multihead
  SoA extraction pattern (2x memcpy + dequant, with multihead/single-head
  branching) with a single function call. Future backends get the pattern
  for free.

* cleanup: extract mxfp_kv_params_init to DRY the K/V init blocks

The K and V initialization in mxfp_fa_params_init were structurally
identical 10-line blocks differing only by tensor/dimension. Extract
into mxfp_kv_params_init(type, D, nb2, ne2) so future MXFP formats
get the multihead SoA addressing logic automatically.

* cleanup: generic MSE round-trip, replace magic buffer sizes with constants

- Remove mse_error_fp8_e4m3 and mse_error_fp6_e2m3: these were identical
  round-trip functions differing only by converter. mxfp_compute_e8m0_mse
  now uses to_elem/to_float directly when mse_error is NULL (FP8/FP6).
  MXFP4 keeps its custom decision-tree MSE. New formats get MSE for free
  by just setting to_elem/to_float in their traits.

- Replace magic 1024/1088 buffer sizes in flash attention with named
  constants MXFP_FA_MAX_D and MXFP_FA_SOA_BUF. One place to change if
  max head dimension grows.

* cleanup: remove dead AoS vec_dot for MXFP8/MXFP6, unify SoA impls

MXFP8 and MXFP6 are KV-cache-only types that use SoA layout for flash
attention. The AoS vec_dot functions (scalar generic, AVX2, NEON) were
dead code — no matmul path uses them.

Removed:
- ggml_vec_dot_mxfp{8,6}_q8_0 from scalar, x86, ARM, quants.h
- ggml_vec_dot_mxfp_q8_0_impl shared helper
- arch-fallback.h aliases for vec_dot mxfp8/mxfp6 (12 lines)
- vec_dot/vec_dot_type registration in ggml-cpu.c

Also unified SoA quantize/dequant: the separate mxfp8_soa_impl and
mxfp6_soa_impl functions (4 functions, ~80 lines) are replaced by two
generic functions (quantize_row_mxfp_soa_impl, dequantize_row_mxfp_soa_impl)
that use traits->bits_per_elem and traits->qs_per_block to handle both
byte-aligned (FP8) and 6-bit packed (FP6) formats. New MXFP formats
get SoA for free by setting these trait fields.

* cleanup: remove all AoS MXFP8/MXFP6 quantize/dequant — SoA only

MXFP8 and MXFP6 are KV-cache-only types. All quantization and
dequantization goes through the SoA (Struct-of-Arrays) path for flash
attention. The AoS (block_mxfp8/block_mxfp6 struct) implementations
were dead code that should never have been added.

Removed:
- quantize_row_mxfp{8,6}_impl, dequantize_row_mxfp{8,6}_impl
- quantize_row_mxfp{8,6}_ref, dequantize_row_mxfp{8,6}
- quantize_mxfp{8,6} (ggml_quantize_chunk wrappers)
- All declarations from ggml-quants.h and quants.h
- to_float/from_float_ref registrations from ggml.c type traits
- from_float registration from ggml-cpu.c CPU traits

Block struct definitions (block_mxfp8, block_mxfp6) are retained for
sizeof() in type traits and validate_row_data.

* cleanup: fail fast in ggml_quantize_chunk for KV-cache-only types

Add explicit GGML_ABORT for MXFP8/MXFP6 in ggml_quantize_chunk —
these are KV-cache-only types that use SoA layout via from_float_soa.
Attempting AoS quantization through this entry point is a bug.
2026-03-22 02:44:56 -04:00
..
peg-parser common: consolidate PEG string parsers (#20263) 2026-03-10 00:29:21 +01:00
.gitignore common : introduce composable PEG parser combinators for chat parsing (#17136) 2025-12-03 12:45:32 +02:00
CMakeLists.txt test : add testing and fixes 2026-03-22 01:07:55 -04:00
export-graph-ops.cpp test-backend-ops: allow loading tests from file and parsing model operators into file (#19896) 2026-03-12 13:26:00 +01:00
get-model.cpp ci : add model tests + script wrapper (#4586) 2024-01-26 14:18:00 +02:00
get-model.h ci : add model tests + script wrapper (#4586) 2024-01-26 14:18:00 +02:00
gguf-model-data.cpp tests : model metadata loading from huggingface (#19796) 2026-02-28 10:44:38 +01:00
gguf-model-data.h tests : model metadata loading from huggingface (#19796) 2026-02-28 10:44:38 +01:00
run-json-schema-to-grammar.mjs llama : move end-user examples to tools directory (#13249) 2025-05-02 20:27:13 +02:00
test-alloc.cpp chore : correct typos [no ci] (#20041) 2026-03-05 08:50:21 +01:00
test-arg-parser.cpp ci, tests : use cmake to download models and remove libcurl dependency (#18791) 2026-01-14 07:46:27 +01:00
test-autorelease.cpp docs : Minor cleanups (#19252) 2026-02-02 08:38:55 +02:00
test-backend-ops.cpp cleanup : remove unused untested code and improve consistency 2026-03-22 02:44:56 -04:00
test-backend-sampler.cpp tests: enable kv_unified to prevent cuda oom error on rtx 2060 (#20645) 2026-03-18 17:40:22 +08:00
test-barrier.cpp Fix race conditions in threadpool when dealing with dynamic/frequent n_threads changes (#17748) 2025-12-10 12:32:23 -08:00
test-c.c ggml : remove kompute backend (#14501) 2025-07-03 07:48:32 +03:00
test-chat-auto-parser.cpp common/parser: fix nasty bug causing subtle corruption of generation prompt (#20825) 2026-03-21 00:19:04 +01:00
test-chat-peg-parser.cpp common/parser: add proper reasoning tag prefill reading (#20424) 2026-03-19 16:58:21 +01:00
test-chat-template.cpp Autoparser - complete refactoring of parser architecture (#18675) 2026-03-06 21:01:00 +01:00
test-chat.cpp common/parser: fix nasty bug causing subtle corruption of generation prompt (#20825) 2026-03-21 00:19:04 +01:00
test-double-float.cpp ggml : minor naming changes (#8433) 2024-07-12 10:46:02 +03:00
test-gbnf-validator.cpp cmake : do not include ./src as public for libllama (#13062) 2025-04-24 16:00:10 +03:00
test-gguf-model-data.cpp tests : model metadata loading from huggingface (#19796) 2026-02-28 10:44:38 +01:00
test-gguf.cpp ggml/gguf : prevent integer overflows (#19856) 2026-02-24 20:17:11 +02:00
test-grammar-integration.cpp common/grammar: fix grammar parsing issues to prevent stack overflow and hangs (#18604) 2026-03-21 18:43:35 +01:00
test-grammar-llguidance.cpp tool/ex/tests: consistently free ctx, then model (#18168) 2025-12-22 11:00:37 +01:00
test-grammar-parser.cpp common/grammar: fix grammar parsing issues to prevent stack overflow and hangs (#18604) 2026-03-21 18:43:35 +01:00
test-jinja.cpp tests : fix test-jinja-py Windows failures by bypassing command-line args [no ci] (#20483) 2026-03-18 10:43:31 +01:00
test-json-partial.cpp common : handle unicode during partial json parsing (#16526) 2025-10-12 16:18:47 +03:00
test-json-schema-to-grammar.cpp examples : fix empty items in json_schema_to_grammar.py [no ci] (#19968) 2026-03-10 14:38:18 +01:00
test-llama-archs.cpp model: mistral small 4 support (#20649) 2026-03-17 00:31:14 +01:00
test-llama-grammar.cpp common/grammar: fix grammar parsing issues to prevent stack overflow and hangs (#18604) 2026-03-21 18:43:35 +01:00
test-log.cpp common : use common_ prefix for common library functions (#9805) 2024-10-10 22:57:42 +02:00
test-lora-conversion-inference.sh cli: new CLI experience (#17824) 2025-12-10 15:28:59 +01:00
test-model-load-cancel.cpp llama : update llama_model API names (#11063) 2025-01-06 10:55:18 +02:00
test-mtmd-c-api.c mtmd : add C public API (#13184) 2025-05-04 23:43:42 +02:00
test-opt.cpp tests : fix test-opt with GGML_BACKEND_DL (#15599) 2025-08-26 22:14:38 +02:00
test-peg-parser.cpp Autoparser - complete refactoring of parser architecture (#18675) 2026-03-06 21:01:00 +01:00
test-quantize-fns.cpp cleanup : remove unused untested code and improve consistency 2026-03-22 02:44:56 -04:00
test-quantize-perf.cpp ci: run the x64 and arm ci on the github machines instead (#16183) 2025-09-25 08:06:06 +03:00
test-quantize-stats.cpp server: introduce API for serving / loading / unloading multiple models (#17470) 2025-12-01 19:41:04 +01:00
test-reasoning-budget.cpp common/parser: handle reasoning budget (#20297) 2026-03-11 10:26:12 +01:00
test-regex-partial.cpp common/grammar : replace problematic backtracking regex `[\s\S]*` (#18342) 2026-01-03 16:02:43 -06:00
test-rope.cpp ggml-cpu: templateify ggml_compute_forward_rope_f32 and _f16 (#16805) 2025-11-11 13:33:24 +02:00
test-sampling.cpp sampling : optimize samplers by reusing bucket sort (#15665) 2025-08-31 20:41:02 +03:00
test-state-restore-fragmented.cpp kv-cache: Fix state restore fragmented cache (#17982) 2025-12-15 19:28:35 +02:00
test-thread-safety.cpp server : support unified cache across slots (#16736) 2025-11-02 18:14:04 +02:00
test-tokenizer-0.cpp tool/ex/tests: consistently free ctx, then model (#18168) 2025-12-22 11:00:37 +01:00
test-tokenizer-0.py py : logging and flake8 suppression refactoring (#7081) 2024-05-05 08:07:48 +03:00
test-tokenizer-0.sh model : add Jina Embeddings v5 Nano (partial EuroBERT) support (#19826) 2026-02-26 12:14:09 +01:00
test-tokenizer-1-bpe.cpp tool/ex/tests: consistently free ctx, then model (#18168) 2025-12-22 11:00:37 +01:00
test-tokenizer-1-spm.cpp tool/ex/tests: consistently free ctx, then model (#18168) 2025-12-22 11:00:37 +01:00
test-tokenizer-random.py ci : switch from pyright to ty (#20826) 2026-03-21 08:54:34 +01:00
test-tokenizers-repo.sh devops: add s390x & ppc64le CI (#15925) 2025-09-27 02:03:33 +08:00
testing.h common : implement new jinja template engine (#18462) 2026-01-16 11:22:06 +01:00