Krzysztof Ostrowski
b3239bf509
Internal change.
...
PiperOrigin-RevId: 681530185
2024-10-02 11:33:06 -07:00
Jan Wassenberg
96d2ab7d31
Minor fix to profiler zone and add comment
...
PiperOrigin-RevId: 681350546
2024-10-02 01:37:50 -07:00
Jan Wassenberg
7d9fcda0d8
-467ms startup: parallel Reshape
...
Also split Softmax into Argmax helper, add comments;
add profiler zones + fix IDE warning
PiperOrigin-RevId: 680954573
2024-10-01 04:11:35 -07:00
Jan Wassenberg
2d14d796e3
1.09x decode speedup for topk=1/temp0: fuse softmax and sample
...
PiperOrigin-RevId: 680589099
2024-09-30 08:37:41 -07:00
Jan Wassenberg
897f902d28
Fix include order, required to build with profiler enabled
...
PiperOrigin-RevId: 680574177
2024-09-30 07:52:50 -07:00
Daniel Keysers
606427022c
Fix compiler errors when trying to generate (unused) code for the ConfigNoVit struct.
...
PiperOrigin-RevId: 679049377
2024-09-26 01:55:26 -07:00
RangerUFO
d1010337c3
Fix prefix-LM mode assertion
2024-09-25 22:22:28 +08:00
Daniel Keysers
f8835fe4a4
Add support for PaliGemma Vision-LM (224x224) to gemma.cpp
...
See https://arxiv.org/abs/2407.07726 for a description of the model.
Because PaliGemma operates as a prefix-LM on the image+prompt, add support for that.
PiperOrigin-RevId: 677841119
2024-09-23 10:09:38 -07:00
Jan Wassenberg
8c0a8834c1
Major compression update, arbitrary-len unpack + new Dot
...
Compression:
* Implement {any packed} x {bf16, f32} 'Load2' and DecompressAndZeroPad
* New compression test for all packed formats, add to GEMMA_TEST_FILES, remove from sfp/nuq_test
* Decompress->DecompressAndZeroPad, use PackedSpan for args with bounds checking
* NUQ: support arbitrary-length enc/dec
* New compression/shared, remove sfp.h and nuq.h
* Move Store2 into Traits and provide Compress2 wrapper
* Remove unused Decompress()-with-pool overload
* Simplify CompressedArrayLen, rename to CompressedArrayElements
* Remove unused DistortionStats b_l1_
Misc:
* Add compensated and Kahan dot, support any length
* Use same Dot function everywhere
* Move exact arithmetic functions into fp_arith
* use FloatPtr and MatPtr typedefs in tests; less stack usage
* Rename args to packed/raw
* Remove Traits::Name, instead TypeName<T>()
* Move kMaxSFP and kClusters/kGroupSize into Sfp/NuqStream
PiperOrigin-RevId: 672868468
2024-09-10 02:22:19 -07:00
Jan Wassenberg
c29e9752c7
Refactor/cleanup, remove even_odd
...
* New compression/shared.h, remove sfp.h
* Remove unused DistortionStats b_l1_
* Move exact arithmetic functions into fp_arith
* Remove even_odd optimization for MatVec (mostly unused)
* use BF16 typedef more widely
* Add kMaxSFP constant
PiperOrigin-RevId: 670996386
2024-09-04 09:25:13 -07:00
Daniel Keysers
a8e08778d4
Add an additional QueryModel() overload to GemmaEnv.
...
Use args only in GemmaEnv constructor, store everything else in RuntimeConfig.
Add runtime option to turn off thread spinning.
PiperOrigin-RevId: 670467320
2024-09-03 02:25:19 -07:00
Zoltan Szabadka
f6abbab3a4
Fix asan failure in local attention computation.
...
PiperOrigin-RevId: 670207380
2024-09-02 07:06:10 -07:00
Jan Wassenberg
4033ed9e78
Avoid duplication of RMSNorm, support all activation/weight types
...
Add test for RMSNorm
Rename VectorizedRopeAndMulBy -> RopeAndMulBy
Move test_util to util/
PiperOrigin-RevId: 668332927
2024-08-28 01:26:55 -07:00
Daniel Keysers
18e6012872
Fix prefill for batched queries.
...
This lets gemma_test/GeographyBatched pass now also for gemma2-27B.
PiperOrigin-RevId: 664827485
2024-08-19 08:50:42 -07:00
Apoorv Reddy
c6eb3b6f0d
VectorizedRopeAndMulBy.
...
~8x reduction (tested on few prompts) in Rope.
~3.8% prefill latency improvement.
~2.6% decode latency improvement.
PiperOrigin-RevId: 664650108
2024-08-18 23:17:01 -07:00
Paul Chang
773333e5be
Expose underlying model configuration: number of layers, heads, etc.
...
PiperOrigin-RevId: 663747853
2024-08-16 09:03:24 -07:00
Jan Wassenberg
301dc8067a
Major MatMul update, 1.9-2.3x speedup on Zen4 via bf16 mul
...
Supports converting all weight/activation formats to native MulT (bf16/f32)
Also:
- ConstMat/MutableMat for const correctness
- Move RowVectorBatch to allocator.h so it can be used from Matmul
- Add matmul.h so MatMulEnv can be used from Activations
- Remove kMaxThreads, detect from PerClusterPools
- Build fix: -inl.h files must be textual_hdrs, and highway.h should precede -inl.h
```
zen4 new
64, 24576, 3072, add=0, MatTA=bf16, MatTB=sfp: 616.6 GFLOPS.
64, 3072, 24576, add=0, MatTA=bf16, MatTB=sfp: 460.7 GFLOPS.
64, 24576, 3072, add=0, MatTA=f32, MatTB=sfp: 598.6 GFLOPS.
64, 3072, 24576, add=0, MatTA=f32, MatTB=sfp: 435.6 GFLOPS.
zen4 old
64, 24576, 3072, add=0, MatTA=f32, MatTB=sfp: 257.5 GFLOPS.
64, 3072, 24576, add=0, MatTA=f32, MatTB=sfp: 231.9 GFLOPS.
```
PiperOrigin-RevId: 663729812
2024-08-16 07:52:20 -07:00
Paul Chang
b9ed12a325
Support directly observing activations, partially replacing LayersOutputFunc
...
LayersOutputFunc is no longer invoked for "blocks" and "final_norm" outputs.
Instead, we directly expose the Activations structure.
PiperOrigin-RevId: 663409316
2024-08-15 12:39:07 -07:00
Jan Wassenberg
22995c699d
Simplify pos handling, auto-increment output arg
...
- no longer multiply by num_queries
- remove unused interleaved prompts
- Rename to Queries*
- Rename batch_start/interleaved_pos/pos to queries_pos
PiperOrigin-RevId: 663331823
2024-08-15 09:25:26 -07:00
Copybara-Service
6763afcd1c
Merge pull request #348 from ufownl:feature/start_pos_per_query_reopen
...
PiperOrigin-RevId: 662533529
2024-08-13 08:51:06 -07:00
RangerUFO
8c634f6486
Fix the position calculation issue in the generation phase
2024-08-12 18:50:23 +02:00
RangerUFO
730b6bfc94
Implement `start_pos` per query for batch interface
2024-08-12 18:50:23 +02:00
Jan Wassenberg
b831fa8482
1.3x prefill, 0.95x decode: matmul replacing last matvec
...
Before 38.28, 9.17 (with profiler enabled, prompt = 330 tok)
```
Gen.FFW : 15414 x 4692352 = 24.166318
Gen.Attention.SumHeads : 15414 x 1394804 = 7.183451 !!
Gen.Embedding : 361 x 49961894 = 6.026297
Gen.Attention.QKV : 15414 x 1005125 = 5.176546
Gen.Attention.DotSoftmax : 15414 x 885480 = 4.560357
RopeAndMulBy : 696528 x 11867 = 2.761818
```
After 49.80, 8.68
```
Gen.FFW : 14448 x 5312783 = 25.646868
Gen.Embedding : 338 x 63044815 = 7.119845
Gen.Attention.QKV : 14448 x 1115003 = 5.382557
Gen.Attention.DotSoftmax : 14448 x 897577 = 4.332957
RopeAndMulBy : 673344 x 11886 = 2.674156
Gen.Attention.SumHeads : 14448 x 518291 = 2.501993 !!
```
PiperOrigin-RevId: 662024085
2024-08-12 03:36:01 -07:00
Jan Wassenberg
282f73ec2f
Add pin flag to disable pinning. Refs #338
...
PiperOrigin-RevId: 661389171
2024-08-09 13:47:12 -07:00
Apoorv Reddy
fd1b0743a7
Rename Gemma9B and Gemma27B to Gemma2_9B and Gemma2_27B.
...
This is to make it clear that these models are part of the Gemma2 family of models.
PiperOrigin-RevId: 661181682
2024-08-09 02:09:06 -07:00
Jan Wassenberg
2ebbe4076f
1.03-1.08x decode speedup: precompute Rope theta, fuse
...
Split attention into functions, move into class.
Fuse Rope and MulBy, allow non-in-place version to avoid copy from q to KV.
Sink if() into MaybeLogitsSoftCap.
PiperOrigin-RevId: 661168418
2024-08-09 01:23:24 -07:00
The gemma.cpp Authors
27258b03e6
Improve performance logging
...
PiperOrigin-RevId: 660534330
2024-08-07 14:15:43 -07:00
Jan Wassenberg
5e433e774a
1.1x prefill speedup, revamp threading in preparation for hierarchical parallelism.
...
Limit thread counts to detected. Add max_clusters arg.
Update detection logic to check for smt0 - previously we pinned to some siblings.
PiperOrigin-RevId: 659755311
2024-08-05 18:50:09 -07:00
Phil Culliton
1982a6ba00
Internal change
...
PiperOrigin-RevId: 657831926
2024-07-30 20:24:54 -07:00
Jan Wassenberg
a24eda8d02
Split matmul into matvec; add large matrix benchmark
...
Rename var names to row/col for more clarity.
Better estimate error tolerance via max abs col sum.
PiperOrigin-RevId: 657601791
2024-07-30 08:29:11 -07:00
Paul Chang
d37c088e44
Extend LayersOutputFunc to take query index and auxillary int
...
PiperOrigin-RevId: 657574814
2024-07-30 06:53:56 -07:00
Jan Wassenberg
8b4915f321
Fix Windows build - macro conflict with param name
...
PiperOrigin-RevId: 657518587
2024-07-30 03:22:32 -07:00
Jan Wassenberg
6ea4232b2e
MatMul cleanup: Mat struct, simplify args.
...
Add large benchmark to test, use 4 threads, skip some targets.
Also use Traits::Name instead of typeid.
PiperOrigin-RevId: 657496185
2024-07-30 01:55:50 -07:00
Jan Wassenberg
f27683152c
1.05x prefill speedup: matvec -> matmul for !MHA
...
Also add C_stride and make shape normal non-template arguments.
PiperOrigin-RevId: 657285945
2024-07-29 12:18:06 -07:00
Jan Wassenberg
2721f54446
Add offset arg to MatMul, rename, Matmul for logits = ~1.1x decode speedup
...
PiperOrigin-RevId: 657167257
2024-07-29 05:34:26 -07:00
Jan Wassenberg
aaf51898b6
Major revamp #2 of Prefill: fix token order, parallel for multi-query
...
- Allocate only the required KV caches and activation batch size
- Add flags for batch sizes
- Const-correct interface: Span of const int.
- Also clean up the KVCache arg to a span.
- Move kPrefillBatchSize into RuntimeConfig and remove related global constants.
PiperOrigin-RevId: 655893197
2024-07-25 03:28:55 -07:00
Daniel Keysers
2346b5a434
Minor polishing: adding comments, renaming variables.
...
PiperOrigin-RevId: 655235006
2024-07-23 11:17:44 -07:00
Daniel Keysers
33334ad454
Fix msan uninitialized scale in optimize_test
...
PiperOrigin-RevId: 654817460
2024-07-22 10:50:25 -07:00
Jan Wassenberg
85cac13fb1
Split up ops.h into ops/ops-inl and matmul-inl
...
PiperOrigin-RevId: 654068303
2024-07-19 11:21:48 -07:00
Jan Wassenberg
5844e6a1e5
Cleanup: add wrapper functions and rename vars to interleaved
...
Simplifies the TransformerLayer function.
Use interleaved* instead of _and_queries.
PiperOrigin-RevId: 653929449
2024-07-19 02:04:11 -07:00
Jan Wassenberg
12016d31c3
Major Prefill/Generate cleanup, 1.3x Prefill speedup
...
This fixes TTFT, which was not including prefill.
PiperOrigin-RevId: 653690626
2024-07-18 11:16:46 -07:00
Jan Wassenberg
3fe79b3876
Fix msan uninitialized scale
...
PiperOrigin-RevId: 653655471
2024-07-18 09:42:31 -07:00
Daniel Keysers
e87e65ca45
Add scale parameter to MatMul.
...
Add accessor to CompressedArray that asserts the scale is 1 and use it.
PiperOrigin-RevId: 653604840
2024-07-18 06:58:56 -07:00
Daniel Keysers
5a751a9a44
Update gemma-27b to the correct query scaling.
...
PiperOrigin-RevId: 653201646
2024-07-17 05:43:52 -07:00
Jan Wassenberg
992a2cbbc0
De-templatize Activations, add RowVectorBatch class
...
Also remove most kBatchSize args.
PiperOrigin-RevId: 653185525
2024-07-17 04:38:15 -07:00
Daniel Keysers
ff34370aac
Simplify FFW by using MatMul_4x4_Batch_Add.
...
Affects only the griffin model, where prefill TPS improves by about 70%.
PiperOrigin-RevId: 652878176
2024-07-16 09:41:23 -07:00
Jan Wassenberg
cd530374b3
Further 1.02x prefill speedup from batch 64->512
...
Measured on SKX. Larger speedup expected for Zen4/SPR.
PiperOrigin-RevId: 652472928
2024-07-15 07:26:10 -07:00
The gemma.cpp Authors
c879133a5a
Increase the prefill batch size to 64.
...
PiperOrigin-RevId: 651754772
2024-07-12 06:28:37 -07:00
The gemma.cpp Authors
df3fb70802
Improve readability with RepeatedAttentionWindowSizes
...
PiperOrigin-RevId: 651431738
2024-07-11 09:11:46 -07:00
Jan Wassenberg
edaf61b983
SVE build fix: avoid capturing vectors directly.
...
Also use more V typedef instead of auto.
PiperOrigin-RevId: 651423685
2024-07-11 08:43:56 -07:00
Jan Wassenberg
be765afce2
Simplify matmul: only 2 overloads
...
Also add StoreHorizontalSumsMaybeAdd wrapper function,
move MatMulSlowBatch into test.
1.02-1.06x speedup.
PiperOrigin-RevId: 651394791
2024-07-11 06:58:42 -07:00
Andrey Vlasov
3e92088595
Remove allocation from GEMM_4x4_Tile when decoding compressed weights by implementing
...
SfpCodec::Dec2F and ComressTraits<T>::Decompress2 for all supported types. It also allows to remove one of the specializations of GEMM_4x4_Tile, handling compressed MatB with one function. As before even when MatA is bf16 it is using 32-bit registers for computations.
Measurements for a 2b-it sfp-encoded model on a AMD Ryzen Threadripper PRO 3945WX 12-Cores:
baseline:
```
32.6254 prefill tokens / sec
8.91429 tokens / sec
115 milliseconds time to first token
```
this change:
```
54.3045 prefill tokens / sec
16.8191 tokens / sec
56 milliseconds time to first token
```
PiperOrigin-RevId: 651369694
2024-07-11 05:13:39 -07:00
Kan Wu
f519ab6693
Refactor configurables.
...
PiperOrigin-RevId: 651259154
2024-07-10 21:30:58 -07:00
Andrey Vlasov
960ff4b4ec
Record time measurements in MatMul tests.
...
PiperOrigin-RevId: 651060711
2024-07-10 10:04:40 -07:00
Daniel Keysers
063bbaa683
Add more comments to attention computation (and some small restructuring).
...
PiperOrigin-RevId: 650929097
2024-07-10 02:39:07 -07:00
Jan Wassenberg
6a3f7cf3ea
Lint fix - string append, remove stale TODO
...
PiperOrigin-RevId: 650197468
2024-07-08 04:11:21 -07:00
Jan Wassenberg
cbb67b4ee0
Move benchmark_helper to evals/, weights_raw to compression/.
...
PiperOrigin-RevId: 650155983
2024-07-08 01:13:23 -07:00
Jan Wassenberg
438b1bace2
Fix handling of %c and %q if eot_string. Fixes #283 , thanks @ljcucc
...
PiperOrigin-RevId: 649651535
2024-07-05 07:54:00 -07:00
Jan Wassenberg
118e802b00
Fix gemma_test - moved to evals/.
...
PiperOrigin-RevId: 649338633
2024-07-04 02:04:05 -07:00
Jan Wassenberg
c7c3daa624
7x compile time speedup: shard gemma.cc
...
Use overloaded functions defined in gemma/instantiations.
Also split out activations.h.
PiperOrigin-RevId: 649053122
2024-07-03 06:35:04 -07:00
Daniel Keysers
a40165dea2
Small cleanups. Fixes gemma_test build.
...
PiperOrigin-RevId: 649008524
2024-07-03 03:13:38 -07:00
Kan Wu
7e4b20455e
Add sliding window attention for Gemma 2.
...
PiperOrigin-RevId: 648778253
2024-07-02 11:08:03 -07:00
Jan Wassenberg
09a7e75ead
Prep for sharding gemma.cc: split into kv_cache, tokenizer.
...
Move activations.h to backprop/ to make space for another activations.h.
PiperOrigin-RevId: 648744500
2024-07-02 09:31:06 -07:00
Jan Wassenberg
85fcd3cd80
Cleanup: add ModelInfo struct, remove gcpp::
...
PiperOrigin-RevId: 648707763
2024-07-02 07:11:15 -07:00
Jan Wassenberg
b1c1ec1d59
Use benchmark_helper in py bindings (adds BOS)
...
Also remove thread clamp (OK to be zero or large).
PiperOrigin-RevId: 648657155
2024-07-02 03:27:15 -07:00
Jan Wassenberg
e527e7662e
Remove unused kSystemPrompt
...
PiperOrigin-RevId: 648429567
2024-07-01 11:18:07 -07:00
Jan Wassenberg
af8eb2fde3
Declutter gemma/ directory, move binaries to evals/ and util/.
...
PiperOrigin-RevId: 648400795
2024-07-01 09:51:04 -07:00
Jan Wassenberg
e588a7f45d
Add config for att/final cap, skip max-subtract. Fixes #278
...
Also update includes/deps for backprop/.
PiperOrigin-RevId: 648399222
2024-07-01 09:45:26 -07:00
The gemma.cpp Authors
da7507e6f0
Add prompt batching to Gemma.cpp.
...
This CL adds a new function to Gemma that allows for batching of multiple prompts. The function takes a vector of prompts and returns a vector of responses. The prompts are processed in parallel, and the responses are returned in the same order as the prompts.
PiperOrigin-RevId: 648367559
2024-07-01 07:51:31 -07:00
Paul Chang
8ac5d66575
Introduce new Gemma 9B and 27B configs
...
PiperOrigin-RevId: 647299080
2024-06-27 06:45:24 -07:00
Paul Chang
78e96fdc70
Refactor model type / training tables, simplify reverse mapping
...
PiperOrigin-RevId: 647069372
2024-06-26 13:59:14 -07:00
The gemma.cpp Authors
7fc8ddf825
Fix a clang tidy warning
...
PiperOrigin-RevId: 646498062
2024-06-25 09:02:59 -07:00
The gemma.cpp Authors
12089417b5
Improve logging when running Gemma examples: fix the issue when max_tokens, max_generated_tokens and temperature were logging without any trailing space/newline.
...
PiperOrigin-RevId: 646014268
2024-06-24 02:00:34 -07:00
The gemma.cpp Authors
80b1347393
Skip the last RMSNormInplaceBatched in the Prefill phase.
...
That only modifies activations.x, but it is called with prefill_activations which are not used after the Prefill call.
PiperOrigin-RevId: 645391387
2024-06-21 08:04:22 -07:00
Copybara-Service
82f16087ba
Merge pull request #266 from ufownl:bugfix/kvcache
...
PiperOrigin-RevId: 645329504
2024-06-21 03:06:52 -07:00
RangerUFO
f7855251ea
Fix compilation errors in clang
...
It will occur in `ubuntu-latest` of GitHub Actions.
2024-06-21 13:40:40 +08:00
RangerUFO
d7787c8f6c
Fix KV cache size calculation error
2024-06-21 13:06:26 +08:00
Daniel Keysers
0570972d43
Fixing two typos.
...
PiperOrigin-RevId: 645103198
2024-06-20 11:33:12 -07:00
The gemma.cpp Authors
a85725614a
Refactor kCachePosSize and kCacheLayerSize into separate functors.
...
PiperOrigin-RevId: 645048519
2024-06-20 08:52:08 -07:00
Jan Wassenberg
48ebba8b7a
Code cleanup
...
- Simplify template arg list, enable deduction
- missing hn:: on " Lanes"
- 1.0f suffix
- move RMSNormBatched into ops.h
- static constexpr -> constexpr
- concrete type instead of LayerT, WeightArrayT
- inline GetWeights
- remove if (runtime_config.verbosity
- merge AllocatePrefill and AllocateDecode
- remove bf_ffw_hidden
PiperOrigin-RevId: 644931277
2024-06-20 01:10:24 -07:00
The gemma.cpp Authors
658fb3e506
Move test placeholder to a later pos.
...
PiperOrigin-RevId: 644808456
2024-06-19 13:24:10 -07:00
The gemma.cpp Authors
0e612d9a20
Split out common parts (embedder and transformer block) from Prefill() and Transformer() into separate functions.
...
PiperOrigin-RevId: 644455520
2024-06-18 11:24:56 -07:00
Paul Chang
d7d9d14f0e
Move kGriffinLayers into ConfigNoSSM, set kGemmaLayers directly
...
For regular (non-SSM) Gemma models, kGriffinLayers is by definition always zero
and kGemmaLayers is just the number of layers.
PiperOrigin-RevId: 644384531
2024-06-18 07:52:52 -07:00
Jan Wassenberg
70506b0a62
Fix debug_prompt and other binaries (internal init)
...
PiperOrigin-RevId: 644367683
2024-06-18 06:48:59 -07:00
Jan Wassenberg
15135f5b3d
Simplify Attention.
...
Shared kMHA, reuse from Activations,
inline Attn lambda, use QDim as the stride between successive Q.
PiperOrigin-RevId: 644343854
2024-06-18 05:08:12 -07:00
Jan Wassenberg
2ac47e4a06
Fix Py binding/run_example: use GemmaEnv
...
PiperOrigin-RevId: 644318962
2024-06-18 03:20:22 -07:00
Jan Wassenberg
a07f60c9a1
1.15x 7b sfp prefill speedup: Matmul in attention
...
2b bf16:
prefill 114.456 -> 115.222
decode 16.8847 -> 16.9987
7b sfp:
prefill 18.8575 -> 21.7325
decode 5.68428 -> 5.79791
PiperOrigin-RevId: 644283676
2024-06-18 01:00:51 -07:00
Jan Wassenberg
704d936764
Further simplification to ForEachTensor, thanks I.K.
...
PiperOrigin-RevId: 643996210
2024-06-17 07:12:26 -07:00
Jan Wassenberg
7d0720675f
Move raw_weights into separate header, used mainly by compress_weights.
...
Fix warnings in backprop/* (include)
PiperOrigin-RevId: 643983136
2024-06-17 06:17:02 -07:00
Jan Wassenberg
ad790d89d1
Fix DASSERT - TiledBatch requires at least 2 vectors.
...
Also use shorthand for weight types.
PiperOrigin-RevId: 643958371
2024-06-17 04:29:01 -07:00
The gemma.cpp Authors
7dbfa44794
Refactor CompressedWeights.
...
PiperOrigin-RevId: 643934198
2024-06-17 02:54:54 -07:00
Ray Smith
e0afdfa8fb
Added bias vector addition to MatMul
...
PiperOrigin-RevId: 643385381
2024-06-14 10:25:16 -07:00
The gemma.cpp Authors
2228055bb8
Internal change.
...
PiperOrigin-RevId: 643330703
2024-06-14 06:53:41 -07:00
Jan Wassenberg
29c0c574e6
Integrate matmul into FFW: 4.3x prefill speedup
...
```
before, bf16:
27.2929 prefill tokens / sec
17.2114 tokens / sec
after, bf16
116.496 prefill tokens / sec
17.5391 tokens / sec
```
PiperOrigin-RevId: 643328437
2024-06-14 06:32:26 -07:00
Ray Smith
198326a682
Removed now redundant non-batch matmul
...
PiperOrigin-RevId: 643317187
2024-06-14 05:13:36 -07:00
Andrey Vlasov
b17631c95f
Implement a missing (bf16, f32) tiled MatMul kernel.
...
PiperOrigin-RevId: 643313676
2024-06-14 04:54:40 -07:00
Jan Wassenberg
d3c6a45b59
Major duplicated code reduction in test/benchmarks
...
Helper functions to tokenize/wrap
Move LayersOutputFunc into RuntimeConfig
AcceptFunc passes the probability
Implement StringFromType using the parser, and verify results match
PiperOrigin-RevId: 643255119
2024-06-14 00:16:25 -07:00
Jan Wassenberg
c15ff9529c
Reduce duplication in Config* by inheriting no-SSM
...
PiperOrigin-RevId: 643030629
2024-06-13 09:48:56 -07:00
Ray Smith
ea525da967
Added MatMul_4x4_Batch which is MatMul_4x4, but with the first template arg moved to the first function arg, so the batch size (num A rows) can be variable at run-time.
...
PiperOrigin-RevId: 643017973
2024-06-13 09:05:40 -07:00
The gemma.cpp Authors
1b40619864
Increase parallelism in ops_test
...
PiperOrigin-RevId: 643013415
2024-06-13 08:50:41 -07:00