Commit Graph

7 Commits

Author SHA1 Message Date
Jan Wassenberg 301dc8067a Major MatMul update, 1.9-2.3x speedup on Zen4 via bf16 mul
Supports converting all weight/activation formats to native MulT (bf16/f32)

Also:
- ConstMat/MutableMat for const correctness
- Move RowVectorBatch to allocator.h so it can be used from Matmul
- Add matmul.h so MatMulEnv can be used from Activations
- Remove kMaxThreads, detect from PerClusterPools
- Build fix: -inl.h files must be textual_hdrs, and highway.h should precede -inl.h

```
zen4 new
64, 24576, 3072, add=0, MatTA=bf16, MatTB=sfp:   616.6 GFLOPS.
64, 3072, 24576, add=0, MatTA=bf16, MatTB=sfp:   460.7 GFLOPS.
64, 24576, 3072, add=0, MatTA=f32, MatTB=sfp:    598.6 GFLOPS.
64, 3072, 24576, add=0, MatTA=f32, MatTB=sfp:    435.6 GFLOPS.

zen4 old
64, 24576, 3072, add=0, MatTA=f32, MatTB=sfp:    257.5 GFLOPS.
64, 3072, 24576, add=0, MatTA=f32, MatTB=sfp:    231.9 GFLOPS.
```

PiperOrigin-RevId: 663729812
2024-08-16 07:52:20 -07:00
Jan Wassenberg 8e028632f7 0.98x prefill: refactor in prep for cache blocking.
Slower because we now init tiles of C and accumulate into them.

Also remove unused var in optimize_test and use BF16 typedef.

PiperOrigin-RevId: 662115916
2024-08-12 09:26:29 -07:00
Jan Wassenberg a24eda8d02 Split matmul into matvec; add large matrix benchmark
Rename var names to row/col for more clarity.
Better estimate error tolerance via max abs col sum.

PiperOrigin-RevId: 657601791
2024-07-30 08:29:11 -07:00
Jan Wassenberg 6ea4232b2e MatMul cleanup: Mat struct, simplify args.
Add large benchmark to test, use 4 threads, skip some targets.
Also use Traits::Name instead of typeid.

PiperOrigin-RevId: 657496185
2024-07-30 01:55:50 -07:00
Jan Wassenberg f27683152c 1.05x prefill speedup: matvec -> matmul for !MHA
Also add C_stride and make shape normal non-template arguments.

PiperOrigin-RevId: 657285945
2024-07-29 12:18:06 -07:00
Jan Wassenberg 2721f54446 Add offset arg to MatMul, rename, Matmul for logits = ~1.1x decode speedup
PiperOrigin-RevId: 657167257
2024-07-29 05:34:26 -07:00
Jan Wassenberg 85cac13fb1 Split up ops.h into ops/ops-inl and matmul-inl
PiperOrigin-RevId: 654068303
2024-07-19 11:21:48 -07:00
Renamed from gemma/ops_test.cc (Browse further)