Jan Wassenberg
85cac13fb1
Split up ops.h into ops/ops-inl and matmul-inl
...
PiperOrigin-RevId: 654068303
2024-07-19 11:21:48 -07:00
Daniel Keysers
e87e65ca45
Add scale parameter to MatMul.
...
Add accessor to CompressedArray that asserts the scale is 1 and use it.
PiperOrigin-RevId: 653604840
2024-07-18 06:58:56 -07:00
Jan Wassenberg
992a2cbbc0
De-templatize Activations, add RowVectorBatch class
...
Also remove most kBatchSize args.
PiperOrigin-RevId: 653185525
2024-07-17 04:38:15 -07:00
Jan Wassenberg
edaf61b983
SVE build fix: avoid capturing vectors directly.
...
Also use more V typedef instead of auto.
PiperOrigin-RevId: 651423685
2024-07-11 08:43:56 -07:00
Jan Wassenberg
be765afce2
Simplify matmul: only 2 overloads
...
Also add StoreHorizontalSumsMaybeAdd wrapper function,
move MatMulSlowBatch into test.
1.02-1.06x speedup.
PiperOrigin-RevId: 651394791
2024-07-11 06:58:42 -07:00
Andrey Vlasov
3e92088595
Remove allocation from GEMM_4x4_Tile when decoding compressed weights by implementing
...
SfpCodec::Dec2F and ComressTraits<T>::Decompress2 for all supported types. It also allows to remove one of the specializations of GEMM_4x4_Tile, handling compressed MatB with one function. As before even when MatA is bf16 it is using 32-bit registers for computations.
Measurements for a 2b-it sfp-encoded model on a AMD Ryzen Threadripper PRO 3945WX 12-Cores:
baseline:
```
32.6254 prefill tokens / sec
8.91429 tokens / sec
115 milliseconds time to first token
```
this change:
```
54.3045 prefill tokens / sec
16.8191 tokens / sec
56 milliseconds time to first token
```
PiperOrigin-RevId: 651369694
2024-07-11 05:13:39 -07:00
Jan Wassenberg
e588a7f45d
Add config for att/final cap, skip max-subtract. Fixes #278
...
Also update includes/deps for backprop/.
PiperOrigin-RevId: 648399222
2024-07-01 09:45:26 -07:00
Jan Wassenberg
48ebba8b7a
Code cleanup
...
- Simplify template arg list, enable deduction
- missing hn:: on " Lanes"
- 1.0f suffix
- move RMSNormBatched into ops.h
- static constexpr -> constexpr
- concrete type instead of LayerT, WeightArrayT
- inline GetWeights
- remove if (runtime_config.verbosity
- merge AllocatePrefill and AllocateDecode
- remove bf_ffw_hidden
PiperOrigin-RevId: 644931277
2024-06-20 01:10:24 -07:00
Jan Wassenberg
a07f60c9a1
1.15x 7b sfp prefill speedup: Matmul in attention
...
2b bf16:
prefill 114.456 -> 115.222
decode 16.8847 -> 16.9987
7b sfp:
prefill 18.8575 -> 21.7325
decode 5.68428 -> 5.79791
PiperOrigin-RevId: 644283676
2024-06-18 01:00:51 -07:00
Ray Smith
e0afdfa8fb
Added bias vector addition to MatMul
...
PiperOrigin-RevId: 643385381
2024-06-14 10:25:16 -07:00
Jan Wassenberg
29c0c574e6
Integrate matmul into FFW: 4.3x prefill speedup
...
```
before, bf16:
27.2929 prefill tokens / sec
17.2114 tokens / sec
after, bf16
116.496 prefill tokens / sec
17.5391 tokens / sec
```
PiperOrigin-RevId: 643328437
2024-06-14 06:32:26 -07:00
Ray Smith
198326a682
Removed now redundant non-batch matmul
...
PiperOrigin-RevId: 643317187
2024-06-14 05:13:36 -07:00
Andrey Vlasov
b17631c95f
Implement a missing (bf16, f32) tiled MatMul kernel.
...
PiperOrigin-RevId: 643313676
2024-06-14 04:54:40 -07:00
Jan Wassenberg
d3c6a45b59
Major duplicated code reduction in test/benchmarks
...
Helper functions to tokenize/wrap
Move LayersOutputFunc into RuntimeConfig
AcceptFunc passes the probability
Implement StringFromType using the parser, and verify results match
PiperOrigin-RevId: 643255119
2024-06-14 00:16:25 -07:00
Ray Smith
ea525da967
Added MatMul_4x4_Batch which is MatMul_4x4, but with the first template arg moved to the first function arg, so the batch size (num A rows) can be variable at run-time.
...
PiperOrigin-RevId: 643017973
2024-06-13 09:05:40 -07:00
Andrey Vlasov
38eb452b94
Support mixed (bf16, sfp) tiled MatMul. Same sfp-decompress strategy as in (f32,
...
sfp) tiled MatMul.
PiperOrigin-RevId: 642901844
2024-06-13 02:07:21 -07:00
The gemma.cpp Authors
2a0e6ee976
Fix numerical issue in Softcap by subtracting max.
...
Also update test threshold.
PiperOrigin-RevId: 642587468
2024-06-12 05:42:16 -07:00
The gemma.cpp Authors
f467670de7
Implement float * SfpStream matmul by decompressing 4 * kColsA_RowsB -sized chunks of the second matrix.
...
PiperOrigin-RevId: 642533996
2024-06-12 01:11:59 -07:00
Jan Wassenberg
3e2396f98c
Use Loader/AppArgs to construct gemma_test model, simplify AcceptFunc
...
accept_token: allow default, check if empty when using
allow mixing sample_func and stream_func, call the latter after the former
Also fix missing includes/deps.
PiperOrigin-RevId: 642240012
2024-06-11 05:53:10 -07:00
Daniel Keysers
c557ad23a8
Adds simple-loop versions of missing batched functions.
...
PiperOrigin-RevId: 642189741
2024-06-11 02:14:02 -07:00
Paul Chang
6c0be20fa6
Fix Softmax on SVE
...
PiperOrigin-RevId: 640947138
2024-06-06 10:39:30 -07:00
The gemma.cpp Authors
39d4115717
Implement mixed mode matmul: f32 * bf16
...
PiperOrigin-RevId: 640940962
2024-06-06 10:21:46 -07:00
Jan Wassenberg
57c2cd8b52
Simplifications: remove GemmaInterface and GemmaImpl
...
Split common and weights into separate lib
Remove common-inl (does not have to be SIMD code), activations.cc
Centralize switch(Model) to avoid duplication
Move CompressWeightsT to compress_weights.cc
Move LoadWeights to weights.cc
PiperOrigin-RevId: 640869202
2024-06-06 05:54:21 -07:00
Paul Chang
175e389c3c
revert back to HWY_ASSERT for lane constraints, qualify hn::Add
...
PiperOrigin-RevId: 640193239
2024-06-04 10:10:18 -07:00
Jan Wassenberg
4f9155d8c6
Add bf16 matmul support, update naming+test
...
Avoid int32, which can easily overflow for large matrices.
Also fix IDE warning in sfp-inl.
PiperOrigin-RevId: 640149845
2024-06-04 07:41:46 -07:00
Zoltan Szabadka
8567978541
Adress review comments
2024-06-04 08:37:54 +00:00
Zoltan Szabadka
36e4d8bbfe
Add first version of backpropagation support.
...
This is still in progress / experimental, currently it is only
implemented for normal gemma MQA attention layers, and no
parallelism is added yet for backward pass.
Since we need to remember all activations from all layers, the
forward pass was also reimplemented with a new activation data
structure.
2024-06-04 08:37:49 +00:00
Paul Chang
5feacf120c
static_assert shape constraints in MatMul 4x4
...
PiperOrigin-RevId: 639069345
2024-05-31 10:02:45 -07:00
Phil Culliton
c616abe628
Unrolled / tiled 4x4 MatMul
...
PiperOrigin-RevId: 638384686
2024-05-29 13:02:35 -07:00
Zoltan Szabadka
542ad0973a
Fix normalization in Softmax function.
2024-05-24 08:58:31 +00:00
Apoorv Reddy
1aaf3b3aae
Documenting the RoPE implementation.
...
PiperOrigin-RevId: 636175297
2024-05-22 08:26:29 -07:00
Jan Wassenberg
22fe9809ac
Fix SVE build: add missing hn::
...
PiperOrigin-RevId: 632481097
2024-05-10 06:49:26 -07:00
Jan Wassenberg
c5c9fc300c
Enable even/odd for SFP. Refs #166
...
Disable it for float32 because there is not enough benefit.
PiperOrigin-RevId: 631788326
2024-05-08 07:09:06 -07:00
Jan Wassenberg
f6d02b2870
Fix RecurrentGemma (refs #166 ) - one Dot was ignoring scale.
...
Remove extra Dot() overload
MatVecAdd always adds, use MatVecT<kAdd> if conditional.
Remove ununsed MatVecAddLoop and MatVecLoop
No longer tsan-verify even_odd
PiperOrigin-RevId: 631377279
2024-05-07 04:40:42 -07:00
Phil Culliton
28ca001d5e
Matmul and test functions
...
PiperOrigin-RevId: 630373984
2024-05-03 06:39:36 -07:00
Copybara-Service
6eeef2e2d9
Merge pull request #166 from samkaufman:deinterleave-vecs
...
PiperOrigin-RevId: 630360778
2024-05-03 05:23:31 -07:00
Zoltan Szabadka
9a2682d544
Use more parallelism in the QKV projections of the MHA block.
...
We compute all three projections with one MatVec and then copy
the kv part to the cache.
Benchmark results for 7b-it model that uses MHA blocks (summarization with
1600 tokens for prefill and essay writing with 500 tokens for generation):
```
Prefill speed Generation speed
Num threads BEFORE AFTER BEFORE AFTER
32 13.75 t/s 14.80 t/s 9.22 t/s 9.77 t/s
64 19.89 t/s 24.83 t/s 12.46 t/s 13.66 t/s
```
2024-05-02 13:46:45 +00:00
Sam Kaufman
4a6173d929
Remove unused vars.
2024-05-02 00:41:44 -07:00
Sam Kaufman
564937ede6
Merge branch 'dev' into deinterleave-vecs
2024-04-30 16:23:04 -07:00
Sam Kaufman
2829ef17ad
Check for HWY_NATIVE_DOT_BF16.
2024-04-30 15:19:28 -07:00
Sam Kaufman
59ebecce22
Fix: specialized MatVecAdd was never called.
2024-04-30 15:17:27 -07:00
Jan Wassenberg
12fb2f05cf
Add per-thread even_odd storage for #166 .
...
Also inline ProjQ and ProjKV lambdas,
add missing includes/deps for ops_test.
PiperOrigin-RevId: 629460608
2024-04-30 10:42:23 -07:00
Sam Kaufman
6a78a23f4c
Abstracted some MatVecAdd spec. dupes.
2024-04-29 16:23:38 -07:00
Sam Kaufman
f608337fef
Remove Bf16ToF32EO and use PromoteEvenTo and PromoteOddTo.
2024-04-29 14:13:07 -07:00
Sam Kaufman
aa0b113214
(VecT*) to static_cast<VecT*>.
2024-04-29 12:53:47 -07:00
Sam Kaufman
5cb63346aa
supports_eo -> kSupportsEvenOdd
2024-04-29 12:51:35 -07:00
Sam Kaufman
0816a1070d
Even-odd layout MatVecs for bf16 weights.
2024-04-28 20:09:25 -07:00
Jan Wassenberg
a982ec1287
Move code to gemma/ so we can remove error-prone copybara: comments.
...
Also fix includes and Lint warnings.
PiperOrigin-RevId: 623127487
2024-04-09 04:45:42 -07:00