Jan Wassenberg
ad790d89d1
Fix DASSERT - TiledBatch requires at least 2 vectors.
...
Also use shorthand for weight types.
PiperOrigin-RevId: 643958371
2024-06-17 04:29:01 -07:00
Ray Smith
e0afdfa8fb
Added bias vector addition to MatMul
...
PiperOrigin-RevId: 643385381
2024-06-14 10:25:16 -07:00
Jan Wassenberg
29c0c574e6
Integrate matmul into FFW: 4.3x prefill speedup
...
```
before, bf16:
27.2929 prefill tokens / sec
17.2114 tokens / sec
after, bf16
116.496 prefill tokens / sec
17.5391 tokens / sec
```
PiperOrigin-RevId: 643328437
2024-06-14 06:32:26 -07:00
Ray Smith
198326a682
Removed now redundant non-batch matmul
...
PiperOrigin-RevId: 643317187
2024-06-14 05:13:36 -07:00
Andrey Vlasov
b17631c95f
Implement a missing (bf16, f32) tiled MatMul kernel.
...
PiperOrigin-RevId: 643313676
2024-06-14 04:54:40 -07:00
Ray Smith
ea525da967
Added MatMul_4x4_Batch which is MatMul_4x4, but with the first template arg moved to the first function arg, so the batch size (num A rows) can be variable at run-time.
...
PiperOrigin-RevId: 643017973
2024-06-13 09:05:40 -07:00
The gemma.cpp Authors
1b40619864
Increase parallelism in ops_test
...
PiperOrigin-RevId: 643013415
2024-06-13 08:50:41 -07:00
Andrey Vlasov
38eb452b94
Support mixed (bf16, sfp) tiled MatMul. Same sfp-decompress strategy as in (f32,
...
sfp) tiled MatMul.
PiperOrigin-RevId: 642901844
2024-06-13 02:07:21 -07:00
The gemma.cpp Authors
f467670de7
Implement float * SfpStream matmul by decompressing 4 * kColsA_RowsB -sized chunks of the second matrix.
...
PiperOrigin-RevId: 642533996
2024-06-12 01:11:59 -07:00
Phil Culliton
b6565e3bf6
Update AssertClose for large matrices and add large matrix test
...
PiperOrigin-RevId: 642277221
2024-06-11 08:22:47 -07:00
Phil Culliton
c5bcb5438c
Fix for transpose matrix creation and additional tests
...
PiperOrigin-RevId: 641868053
2024-06-10 05:24:04 -07:00
Phil Culliton
d985d8b867
Shifting large matrix init to heap in ops_test.cc
...
PiperOrigin-RevId: 641311100
2024-06-07 11:38:42 -07:00
Paul Chang
6c0be20fa6
Fix Softmax on SVE
...
PiperOrigin-RevId: 640947138
2024-06-06 10:39:30 -07:00
The gemma.cpp Authors
39d4115717
Implement mixed mode matmul: f32 * bf16
...
PiperOrigin-RevId: 640940962
2024-06-06 10:21:46 -07:00
Phil Culliton
e71d82ead9
Fix for GenerateZeroMat call in TestTiledMatMul
...
PiperOrigin-RevId: 640180868
2024-06-04 09:32:23 -07:00
Jan Wassenberg
4f9155d8c6
Add bf16 matmul support, update naming+test
...
Avoid int32, which can easily overflow for large matrices.
Also fix IDE warning in sfp-inl.
PiperOrigin-RevId: 640149845
2024-06-04 07:41:46 -07:00
Phil Culliton
c616abe628
Unrolled / tiled 4x4 MatMul
...
PiperOrigin-RevId: 638384686
2024-05-29 13:02:35 -07:00
Zoltan Szabadka
542ad0973a
Fix normalization in Softmax function.
2024-05-24 08:58:31 +00:00
Jan Wassenberg
f6d02b2870
Fix RecurrentGemma (refs #166 ) - one Dot was ignoring scale.
...
Remove extra Dot() overload
MatVecAdd always adds, use MatVecT<kAdd> if conditional.
Remove ununsed MatVecAddLoop and MatVecLoop
No longer tsan-verify even_odd
PiperOrigin-RevId: 631377279
2024-05-07 04:40:42 -07:00
Phil Culliton
28ca001d5e
Matmul and test functions
...
PiperOrigin-RevId: 630373984
2024-05-03 06:39:36 -07:00
Zoltan Szabadka
9a2682d544
Use more parallelism in the QKV projections of the MHA block.
...
We compute all three projections with one MatVec and then copy
the kv part to the cache.
Benchmark results for 7b-it model that uses MHA blocks (summarization with
1600 tokens for prefill and essay writing with 500 tokens for generation):
```
Prefill speed Generation speed
Num threads BEFORE AFTER BEFORE AFTER
32 13.75 t/s 14.80 t/s 9.22 t/s 9.77 t/s
64 19.89 t/s 24.83 t/s 12.46 t/s 13.66 t/s
```
2024-05-02 13:46:45 +00:00
Jan Wassenberg
12fb2f05cf
Add per-thread even_odd storage for #166 .
...
Also inline ProjQ and ProjKV lambdas,
add missing includes/deps for ops_test.
PiperOrigin-RevId: 629460608
2024-04-30 10:42:23 -07:00
Jan Wassenberg
a982ec1287
Move code to gemma/ so we can remove error-prone copybara: comments.
...
Also fix includes and Lint warnings.
PiperOrigin-RevId: 623127487
2024-04-09 04:45:42 -07:00