Jan Wassenberg
2b4c16e243
Remove Griffin support
...
Also add IsObsolete helper
PiperOrigin-RevId: 803376921
2025-09-05 02:35:40 -07:00
Jan Wassenberg
56186193c1
Replace mt19937 with new generator to enable parallel sampling
...
Split it into immutable AesCtrEngine and RngStream
Also add RowSpan and Logits span
PiperOrigin-RevId: 803336423
2025-09-04 23:49:10 -07:00
Jan Wassenberg
5d1693e806
Internal change
...
PiperOrigin-RevId: 803083229
2025-09-04 10:31:20 -07:00
Jan Wassenberg
afd82376a5
Add AES-CTR RNG for parallel sampling (not yet used)
...
PiperOrigin-RevId: 802991142
2025-09-04 05:58:42 -07:00
Jan Wassenberg
4be4799727
Remove kMaxPackages and per-package-related code
...
matmul: remove kMaxClusters, dynamic allocation
PiperOrigin-RevId: 802950348
2025-09-04 03:33:12 -07:00
Jan Wassenberg
7263ab8445
MatMul simplification, threading strategy improvements
...
remove MatMul f32 special case (smaller code),
types: Add u32/u64 for use by Activations
move renamed ParallelismStrategy to threading_context so can pass ctx
ensure worker index is unique across clusters
matmul.h: const member functions for renamed policy classes (easier to call)
PiperOrigin-RevId: 802848086
2025-09-03 21:45:07 -07:00
Marie White
74ffe079c4
Create separate MMStorage objects per cluster.
...
PiperOrigin-RevId: 802588625
2025-09-03 09:35:48 -07:00
Jan Wassenberg
b7b3d353db
Simplify MatMul: remove F32 special case (build time)
...
Also move kMaxM into separate kMaxBatchSize
PiperOrigin-RevId: 802086590
2025-09-02 04:29:21 -07:00
Jan Wassenberg
1e3c853e80
Add ParallelFor wrapper function and one new mode
...
Move ParallelismType from matmul.h to threading.h
Replace SmallParallelFor with ParallelFor and the new mode
PiperOrigin-RevId: 802038452
2025-09-02 01:40:09 -07:00
Marie White
3737224132
Add in-cluster parallel policy. Update policy to include cluster_idx.
...
PiperOrigin-RevId: 802016308
2025-09-02 00:16:00 -07:00
Marie White
27cb8e12d9
Handle non-threading parallel policy.
...
PiperOrigin-RevId: 802012517
2025-09-02 00:02:57 -07:00
Marie White
0d2e74d74a
Add MMOptions as an argument to Matmul.
...
PiperOrigin-RevId: 802008198
2025-09-01 23:46:39 -07:00
Jan Wassenberg
229bd078a1
1.29x speedup: bf16 C1/C2. Extend most ops to any type, expand test coverage.
...
Also increase dot_test.cc range for Zen4, and matmul_test tolerance (failing in some configs)
PiperOrigin-RevId: 801789922
2025-09-01 06:34:04 -07:00
Marie White
bc0c0bac8b
Add non-threading parallel policy.
...
PiperOrigin-RevId: 800913294
2025-08-29 08:39:06 -07:00
Marie White
00b70f69c5
Include parallelism type in DoMatMul. Also remove package handling.
...
PiperOrigin-RevId: 800902568
2025-08-29 08:04:52 -07:00
Jan Wassenberg
0ae8646731
Fix remainder handling for Paligemma
...
No longer attempt to skip the remainder handling because B might also be a non-padded view.
PiperOrigin-RevId: 800890805
2025-08-29 07:25:52 -07:00
Marie White
973e284ed6
Refactor Matmul to use a policy class for parallelization.
...
PiperOrigin-RevId: 800864489
2025-08-29 05:40:39 -07:00
Jan Wassenberg
6c39a2dea4
1.01x speedup: More bf16 activations to reduce DecompressA.
...
Also move observer call into function, format gemma_args.
PiperOrigin-RevId: 800827400
2025-08-29 03:19:01 -07:00
Jan Wassenberg
7288891439
Remove F64 partial storage in matmul.
...
Also remove no longer used kMaxN; row_ptrs only used for C
PiperOrigin-RevId: 800774757
2025-08-29 00:12:08 -07:00
Jan Wassenberg
31c09cca4c
f32 LoopKC: 1.37x(M=512), 1.19(M=128) single-K F32,BF16 matmul speedup on SKX
...
Add a special case for A=F32,B=BF16, used when there is no native bf16 dot product.
dot-inl: ensure bf16,f32 and f32,bf16 both get promoted to float before f64 summation
matmul.cc: update autotuning to reflect actual A size
matmul_test: add all combinations of bf16/f32, report all results, not just first difference, check non-vector-aligned K
PiperOrigin-RevId: 800487817
2025-08-28 08:55:50 -07:00
Jan Wassenberg
98ddc166db
Expand ThreadingContext comments
...
PiperOrigin-RevId: 800479954
2025-08-28 08:32:10 -07:00
Marie White
6128e758ff
Change ffw_out from B16 to F32.
...
PiperOrigin-RevId: 800330411
2025-08-28 00:01:39 -07:00
Jan Wassenberg
5411fd846d
Minor: batched NotifyGenerate, fix comment/dep
...
PiperOrigin-RevId: 799889802
2025-08-26 23:33:17 -07:00
Jan Wassenberg
86afd53076
1.04x speedup: Parallelize SoftCap
...
Also require opt-in constexpr flag for observer callbacks, update zones
PiperOrigin-RevId: 799655163
2025-08-26 11:55:20 -07:00
Jan Wassenberg
ed2f0bd1b0
Fix pos assertions, refs #665
...
Ensure the streaming func pos matches the number of calls.
Add two arguments that control pos+1 and pos+=1 behavior.
Also cleanup/add comments.
run: use batch_stream_func, add assert, higher verbosity for MM autotune output
PiperOrigin-RevId: 799511163
2025-08-26 04:50:40 -07:00
Jan Wassenberg
9bf0fe4e37
Internal change
...
PiperOrigin-RevId: 799509375
2025-08-26 04:44:08 -07:00
Jan Wassenberg
d3a5ddf657
Merge pull request #663 from junjihashimoto:feature/api-server
...
PiperOrigin-RevId: 797731089
2025-08-24 11:57:05 +02:00
Rhett Stucki
73f1140dca
Fix an off-by-one error after StreamAndUpdateEOS() to remove the MSAN warning about reading an uninitialized variable in the kv_cache.
...
The logic for choosing whether or not to attend to the last token during prefill wasn't completely consistent with StreamAndUpdateEOS(), causing an off-by-one error that prevented the kv_cache from being fully populated.
PiperOrigin-RevId: 797614310
2025-08-20 22:59:58 -07:00
Junji Hashimoto
41321611fd
feature: add API server and client with Google protocol
2025-08-21 11:32:48 +09:00
Jan Wassenberg
41a86d41a9
Fix preadv error: only enable if we have a handle
...
PiperOrigin-RevId: 795455020
2025-08-15 06:30:34 -07:00
Phil Culliton
78573b6718
Internal change. Add deduction for 270M.
...
PiperOrigin-RevId: 795041810
2025-08-14 08:04:38 -07:00
Phil Culliton
d044801c1d
Internal change
...
PiperOrigin-RevId: 794620076
2025-08-13 09:47:45 -07:00
Jan Wassenberg
71406cf6d0
More profiler interface fixes: hwy:: plus avoid ADD_ZONE
...
PiperOrigin-RevId: 794493165
2025-08-13 03:15:48 -07:00
Jan Wassenberg
faa4102992
(Resubmit) Prepare profiler annotations for new API
...
Pass hwy::Profiler& to low-level functions.
Used ThreadingContext arg instead of NestedPools.
Use new PROFILER_ZONE3.
PiperOrigin-RevId: 794461159
2025-08-13 01:38:24 -07:00
The gemma.cpp Authors
a2d9133f7d
Prepare profiler annotations for new API
...
Pass hwy::Profiler& to low-level functions.
Used ThreadingContext arg instead of NestedPools.
Use new PROFILER_ZONE3.
PiperOrigin-RevId: 793865287
2025-08-11 17:51:38 -07:00
Jan Wassenberg
4cbf63e6f0
Prepare profiler annotations for new API
...
Pass hwy::Profiler& to low-level functions.
Used ThreadingContext arg instead of NestedPools.
Use new PROFILER_ZONE3.
PiperOrigin-RevId: 793821255
2025-08-11 15:34:52 -07:00
Jan Wassenberg
eef564e8f0
Prepare profiler annotations for new API
...
PiperOrigin-RevId: 792808391
2025-08-08 16:51:29 -07:00
Copybara-Service
2e9c93a609
Merge pull request #649 from KaranocaVe:main
...
PiperOrigin-RevId: 792678119
2025-08-08 10:35:57 -07:00
Jan Wassenberg
33fbac0880
Exporter updates/fixes
...
PiperOrigin-RevId: 791046073
2025-08-04 22:36:33 -07:00
Jan Wassenberg
4e062d68f7
Update BlobWriter comments, WriteAll->Finalize
...
PiperOrigin-RevId: 790792133
2025-08-04 10:01:38 -07:00
Jan Wassenberg
701841897b
Default to disabling per-socket parallelization
...
weights: default to Read for small-batch (only look at qbatch, not the larger prefill tbatch)
PiperOrigin-RevId: 790787643
2025-08-04 09:49:14 -07:00
Ivo Ristovski List
b56b2f05e4
Automated Code Change
...
PiperOrigin-RevId: 789876258
2025-08-01 13:29:50 -07:00
Jan Wassenberg
799c264df3
Pre-tune thread pool before matmul
...
Also improve profiler annotations - remove near-zero ones and add more for startup
PiperOrigin-RevId: 789352414
2025-07-31 08:45:26 -07:00
KaranocaVe
32286f0465
Merge branch 'dev' into main
2025-07-31 22:40:56 +08:00
Charles Zhao
50ee1a3e92
Write SBS progressively.
...
(1) Directly write to file in BlobWriter::Add and destruct the MatOwner to release the rams.
(2) Write a fake header to indicate this is V2, and write correct header and directory at the end of the file.
(3) Tested on loading sbs written the old way, and new way, both worked.
PiperOrigin-RevId: 789306837
2025-07-31 06:05:38 -07:00
KaranocaVe
0ea118ebbe
Update run.cc, CMakeLists and README for incompatible code, dependency changes and argument updates
2025-07-31 00:59:16 +08:00
Jan Wassenberg
8715eda512
Improved layer idx parsing
...
PiperOrigin-RevId: 788868522
2025-07-30 05:49:45 -07:00
Jan Wassenberg
d831ddce5b
Fix file mapping: was letting the smart pointer go out of scope
...
Also save+print the IO mode used.
PiperOrigin-RevId: 788848165
2025-07-30 04:30:10 -07:00
Jan Wassenberg
2141d4788d
Add IsAppendOnly flag to file and if true, disable parallel writes
...
PiperOrigin-RevId: 788805810
2025-07-30 01:51:37 -07:00
Jan Wassenberg
d22ba2ac96
Update layer index parsing and allow tokenizer override
...
PiperOrigin-RevId: 788797948
2025-07-30 01:22:34 -07:00