Jan Wassenberg
229bd078a1
1.29x speedup: bf16 C1/C2. Extend most ops to any type, expand test coverage.
...
Also increase dot_test.cc range for Zen4, and matmul_test tolerance (failing in some configs)
PiperOrigin-RevId: 801789922
2025-09-01 06:34:04 -07:00
Jan Wassenberg
cd80d8b24d
Speed up builds by skipping rarely used targets
...
Centralize previous code into GEMMA_DISABLED_TARGETS
PiperOrigin-RevId: 772433723
2025-06-17 05:44:20 -07:00
Jan Wassenberg
cf4d7ceb82
1.16x decode speedup: remove last MatVec in Attention
...
Precompute row pointers.
Remove no longer used MHA support; QStride -> qkv_dim.
Remove RowPtr from MatMul interface, use only MatPtrT.
Require opt-in define for NUQ to speed up builds.
Also fix io.cc on Windows.
PiperOrigin-RevId: 766228108
2025-06-02 09:40:29 -07:00
Jan Wassenberg
3890eb5412
Remove backprop/
...
Also remove MatPtrT::Packed(); use PackedScale1 instead where const, or Row(0).
PiperOrigin-RevId: 764243198
2025-05-28 07:01:17 -07:00
Jan Wassenberg
2038dfd9cc
Minor: rename compression/shared -> types.h
...
PiperOrigin-RevId: 758199851
2025-05-13 06:53:21 -07:00