Commit Graph

852 Commits

Author SHA1 Message Date
Jan Wassenberg 2b4c16e243 Remove Griffin support
Also add IsObsolete helper

PiperOrigin-RevId: 803376921
2025-09-05 02:35:40 -07:00
Jan Wassenberg 56186193c1 Replace mt19937 with new generator to enable parallel sampling
Split it into immutable AesCtrEngine and RngStream
Also add RowSpan and Logits span

PiperOrigin-RevId: 803336423
2025-09-04 23:49:10 -07:00
Jan Wassenberg 5d1693e806 Internal change
PiperOrigin-RevId: 803083229
2025-09-04 10:31:20 -07:00
Jan Wassenberg afd82376a5 Add AES-CTR RNG for parallel sampling (not yet used)
PiperOrigin-RevId: 802991142
2025-09-04 05:58:42 -07:00
Jan Wassenberg 4be4799727 Remove kMaxPackages and per-package-related code
matmul: remove kMaxClusters, dynamic allocation
PiperOrigin-RevId: 802950348
2025-09-04 03:33:12 -07:00
Jan Wassenberg 7263ab8445 MatMul simplification, threading strategy improvements
remove MatMul f32 special case (smaller code),
types: Add u32/u64 for use by Activations
move renamed ParallelismStrategy to threading_context so can pass ctx
ensure worker index is unique across clusters
matmul.h: const member functions for renamed policy classes (easier to call)
PiperOrigin-RevId: 802848086
2025-09-03 21:45:07 -07:00
Marie White 74ffe079c4 Create separate MMStorage objects per cluster.
PiperOrigin-RevId: 802588625
2025-09-03 09:35:48 -07:00
Jan Wassenberg b7b3d353db Simplify MatMul: remove F32 special case (build time)
Also move kMaxM into separate kMaxBatchSize

PiperOrigin-RevId: 802086590
2025-09-02 04:29:21 -07:00
Jan Wassenberg 1e3c853e80 Add ParallelFor wrapper function and one new mode
Move ParallelismType from matmul.h to threading.h
Replace SmallParallelFor with ParallelFor and the new mode

PiperOrigin-RevId: 802038452
2025-09-02 01:40:09 -07:00
Marie White 3737224132 Add in-cluster parallel policy. Update policy to include cluster_idx.
PiperOrigin-RevId: 802016308
2025-09-02 00:16:00 -07:00
Marie White 27cb8e12d9 Handle non-threading parallel policy.
PiperOrigin-RevId: 802012517
2025-09-02 00:02:57 -07:00
Marie White 0d2e74d74a Add MMOptions as an argument to Matmul.
PiperOrigin-RevId: 802008198
2025-09-01 23:46:39 -07:00
Jan Wassenberg 229bd078a1 1.29x speedup: bf16 C1/C2. Extend most ops to any type, expand test coverage.
Also increase dot_test.cc range for Zen4, and matmul_test tolerance (failing in some configs)

PiperOrigin-RevId: 801789922
2025-09-01 06:34:04 -07:00
Marie White bc0c0bac8b Add non-threading parallel policy.
PiperOrigin-RevId: 800913294
2025-08-29 08:39:06 -07:00
Marie White 00b70f69c5 Include parallelism type in DoMatMul. Also remove package handling.
PiperOrigin-RevId: 800902568
2025-08-29 08:04:52 -07:00
Jan Wassenberg 0ae8646731 Fix remainder handling for Paligemma
No longer attempt to skip the remainder handling because B might also be a non-padded view.

PiperOrigin-RevId: 800890805
2025-08-29 07:25:52 -07:00
Marie White 973e284ed6 Refactor Matmul to use a policy class for parallelization.
PiperOrigin-RevId: 800864489
2025-08-29 05:40:39 -07:00
Jan Wassenberg 6c39a2dea4 1.01x speedup: More bf16 activations to reduce DecompressA.
Also move observer call into function, format gemma_args.

PiperOrigin-RevId: 800827400
2025-08-29 03:19:01 -07:00
Jan Wassenberg 7288891439 Remove F64 partial storage in matmul.
Also remove no longer used kMaxN; row_ptrs only used for C

PiperOrigin-RevId: 800774757
2025-08-29 00:12:08 -07:00
Jan Wassenberg 31c09cca4c f32 LoopKC: 1.37x(M=512), 1.19(M=128) single-K F32,BF16 matmul speedup on SKX
Add a special case for A=F32,B=BF16, used when there is no native bf16 dot product.

dot-inl: ensure bf16,f32 and f32,bf16 both get promoted to float before f64 summation
matmul.cc: update autotuning to reflect actual A size
matmul_test: add all combinations of bf16/f32, report all results, not just first difference, check non-vector-aligned K
PiperOrigin-RevId: 800487817
2025-08-28 08:55:50 -07:00
Jan Wassenberg 98ddc166db Expand ThreadingContext comments
PiperOrigin-RevId: 800479954
2025-08-28 08:32:10 -07:00
Marie White 6128e758ff Change ffw_out from B16 to F32.
PiperOrigin-RevId: 800330411
2025-08-28 00:01:39 -07:00
Jan Wassenberg 5411fd846d Minor: batched NotifyGenerate, fix comment/dep
PiperOrigin-RevId: 799889802
2025-08-26 23:33:17 -07:00
Jan Wassenberg 86afd53076 1.04x speedup: Parallelize SoftCap
Also require opt-in constexpr flag for observer callbacks, update zones

PiperOrigin-RevId: 799655163
2025-08-26 11:55:20 -07:00
Jan Wassenberg ed2f0bd1b0 Fix pos assertions, refs #665
Ensure the streaming func pos matches the number of calls.
Add two arguments that control pos+1 and pos+=1 behavior.
Also cleanup/add comments.
run: use batch_stream_func, add assert, higher verbosity for MM autotune output
PiperOrigin-RevId: 799511163
2025-08-26 04:50:40 -07:00
Jan Wassenberg 9bf0fe4e37 Internal change
PiperOrigin-RevId: 799509375
2025-08-26 04:44:08 -07:00
Jan Wassenberg d3a5ddf657 Merge pull request #663 from junjihashimoto:feature/api-server
PiperOrigin-RevId: 797731089
2025-08-24 11:57:05 +02:00
Rhett Stucki 73f1140dca Fix an off-by-one error after StreamAndUpdateEOS() to remove the MSAN warning about reading an uninitialized variable in the kv_cache.
The logic for choosing whether or not to attend to the last token during prefill wasn't completely consistent with StreamAndUpdateEOS(), causing an off-by-one error that prevented the kv_cache from being fully populated.

PiperOrigin-RevId: 797614310
2025-08-20 22:59:58 -07:00
Junji Hashimoto 41321611fd feature: add API server and client with Google protocol 2025-08-21 11:32:48 +09:00
Jan Wassenberg 41a86d41a9 Fix preadv error: only enable if we have a handle
PiperOrigin-RevId: 795455020
2025-08-15 06:30:34 -07:00
Phil Culliton 78573b6718 Internal change. Add deduction for 270M.
PiperOrigin-RevId: 795041810
2025-08-14 08:04:38 -07:00
Phil Culliton d044801c1d Internal change
PiperOrigin-RevId: 794620076
2025-08-13 09:47:45 -07:00
Jan Wassenberg 71406cf6d0 More profiler interface fixes: hwy:: plus avoid ADD_ZONE
PiperOrigin-RevId: 794493165
2025-08-13 03:15:48 -07:00
Jan Wassenberg faa4102992 (Resubmit) Prepare profiler annotations for new API
Pass hwy::Profiler& to low-level functions.
Used ThreadingContext arg instead of NestedPools.
Use new PROFILER_ZONE3.

PiperOrigin-RevId: 794461159
2025-08-13 01:38:24 -07:00
The gemma.cpp Authors a2d9133f7d Prepare profiler annotations for new API
Pass hwy::Profiler& to low-level functions.
Used ThreadingContext arg instead of NestedPools.
Use new PROFILER_ZONE3.

PiperOrigin-RevId: 793865287
2025-08-11 17:51:38 -07:00
Jan Wassenberg 4cbf63e6f0 Prepare profiler annotations for new API
Pass hwy::Profiler& to low-level functions.
Used ThreadingContext arg instead of NestedPools.
Use new PROFILER_ZONE3.

PiperOrigin-RevId: 793821255
2025-08-11 15:34:52 -07:00
Jan Wassenberg eef564e8f0 Prepare profiler annotations for new API
PiperOrigin-RevId: 792808391
2025-08-08 16:51:29 -07:00
Copybara-Service 2e9c93a609 Merge pull request #649 from KaranocaVe:main
PiperOrigin-RevId: 792678119
2025-08-08 10:35:57 -07:00
Jan Wassenberg 33fbac0880 Exporter updates/fixes
PiperOrigin-RevId: 791046073
2025-08-04 22:36:33 -07:00
Jan Wassenberg 4e062d68f7 Update BlobWriter comments, WriteAll->Finalize
PiperOrigin-RevId: 790792133
2025-08-04 10:01:38 -07:00
Jan Wassenberg 701841897b Default to disabling per-socket parallelization
weights: default to Read for small-batch (only look at qbatch, not the larger prefill tbatch)
PiperOrigin-RevId: 790787643
2025-08-04 09:49:14 -07:00
Ivo Ristovski List b56b2f05e4 Automated Code Change
PiperOrigin-RevId: 789876258
2025-08-01 13:29:50 -07:00
Jan Wassenberg 799c264df3 Pre-tune thread pool before matmul
Also improve profiler annotations - remove near-zero ones and add more for startup

PiperOrigin-RevId: 789352414
2025-07-31 08:45:26 -07:00
KaranocaVe 32286f0465
Merge branch 'dev' into main 2025-07-31 22:40:56 +08:00
Charles Zhao 50ee1a3e92 Write SBS progressively.
(1) Directly write to file in BlobWriter::Add and destruct the MatOwner to release the rams.

(2) Write a fake header to indicate this is V2, and write correct header and directory at the end of the file.

(3) Tested on loading sbs written the old way, and new way, both worked.

PiperOrigin-RevId: 789306837
2025-07-31 06:05:38 -07:00
KaranocaVe 0ea118ebbe Update run.cc, CMakeLists and README for incompatible code, dependency changes and argument updates 2025-07-31 00:59:16 +08:00
Jan Wassenberg 8715eda512 Improved layer idx parsing
PiperOrigin-RevId: 788868522
2025-07-30 05:49:45 -07:00
Jan Wassenberg d831ddce5b Fix file mapping: was letting the smart pointer go out of scope
Also save+print the IO mode used.

PiperOrigin-RevId: 788848165
2025-07-30 04:30:10 -07:00
Jan Wassenberg 2141d4788d Add IsAppendOnly flag to file and if true, disable parallel writes
PiperOrigin-RevId: 788805810
2025-07-30 01:51:37 -07:00
Jan Wassenberg d22ba2ac96 Update layer index parsing and allow tokenizer override
PiperOrigin-RevId: 788797948
2025-07-30 01:22:34 -07:00