Jan Wassenberg
41a86d41a9
Fix preadv error: only enable if we have a handle
...
PiperOrigin-RevId: 795455020
2025-08-15 06:30:34 -07:00
Jan Wassenberg
4e062d68f7
Update BlobWriter comments, WriteAll->Finalize
...
PiperOrigin-RevId: 790792133
2025-08-04 10:01:38 -07:00
Ivo Ristovski List
b56b2f05e4
Automated Code Change
...
PiperOrigin-RevId: 789876258
2025-08-01 13:29:50 -07:00
Jan Wassenberg
799c264df3
Pre-tune thread pool before matmul
...
Also improve profiler annotations - remove near-zero ones and add more for startup
PiperOrigin-RevId: 789352414
2025-07-31 08:45:26 -07:00
Charles Zhao
50ee1a3e92
Write SBS progressively.
...
(1) Directly write to file in BlobWriter::Add and destruct the MatOwner to release the rams.
(2) Write a fake header to indicate this is V2, and write correct header and directory at the end of the file.
(3) Tested on loading sbs written the old way, and new way, both worked.
PiperOrigin-RevId: 789306837
2025-07-31 06:05:38 -07:00
Jan Wassenberg
d831ddce5b
Fix file mapping: was letting the smart pointer go out of scope
...
Also save+print the IO mode used.
PiperOrigin-RevId: 788848165
2025-07-30 04:30:10 -07:00
Jan Wassenberg
2141d4788d
Add IsAppendOnly flag to file and if true, disable parallel writes
...
PiperOrigin-RevId: 788805810
2025-07-30 01:51:37 -07:00
Jan Wassenberg
e76e29ce11
De-singleton ThreadingContext so callers can pass in their own
...
weights.cc: fix BindB argument for bf16 tensors
threading_test: enable autotune
PiperOrigin-RevId: 785763618
2025-07-22 02:08:46 -07:00
Jan Wassenberg
56c9196eb6
Add blob_path to config deduction message
...
PiperOrigin-RevId: 782188689
2025-07-11 18:58:56 -07:00
Jan Wassenberg
a04cc287b2
Move MatMulEnv out of Gemma to enable concurrent calls
...
Also update benchmark_helper config print: add profiler, remove free mem
PiperOrigin-RevId: 774662974
2025-06-23 01:20:09 -07:00
Daniel Keysers
d7b23d532a
Restructure internal initialization.
...
PiperOrigin-RevId: 769507096
2025-06-10 01:25:31 -07:00
Jan Wassenberg
9efdcfd45c
1.07x batch decode speedup: more BF16 weights and activations
...
BF16 att_sums and ffw_out
Support BF16 B views without decompression
Support arbitrary types in MulByConstAndAdd, AddFrom
Also update profiler annotations in ops-inl.h
PiperOrigin-RevId: 766995010
2025-06-03 23:30:18 -07:00
Jan Wassenberg
794a21a4e6
Major refactor to de-templatize gemma-inl and weights
...
This replaces per-weight instantiations of all code with only per-MatMul/norm.
Reduces binary size by 133KiB.
WeightsOwner is no longer required for type erasing, hence it is replaced with ModelWeightsPtrs.
Also remove unused EmbedToken, replaced with EmbedMMToken.
PiperOrigin-RevId: 766497657
2025-06-02 23:01:35 -07:00
Jan Wassenberg
cf4d7ceb82
1.16x decode speedup: remove last MatVec in Attention
...
Precompute row pointers.
Remove no longer used MHA support; QStride -> qkv_dim.
Remove RowPtr from MatMul interface, use only MatPtrT.
Require opt-in define for NUQ to speed up builds.
Also fix io.cc on Windows.
PiperOrigin-RevId: 766228108
2025-06-02 09:40:29 -07:00
Jan Wassenberg
cb188d4a0e
Fix RowT issue and improve Griffin (currently still broken)
...
Use type-safe MatPtrT via dynamic_cast, avoid/remove unsafe RowT
activations: Griffin tensors are now padded
Griffin: add batching support, fix conv1d_cache allocation
weights: bundle to TensorToRead, add kNoPad flag, fix SplitW1
const-correct fix for ForEachTensor
blob_store: move BlobIO2 to .cc and rename BlobIO
PiperOrigin-RevId: 760610094
2025-05-19 07:02:10 -07:00
Jan Wassenberg
c443adee33
3.8x speedup of weights loading via preadv on Linux
...
Also move BlobReader reading functionality to weights.cc
PiperOrigin-RevId: 759240310
2025-05-15 11:55:15 -07:00
Jan Wassenberg
d538a6d6c6
Cleanup: remove unused kCyclic, remove 2 suffix
...
Also remove now unused allocator arg and fix warnings (cast, struct/class mismatch)
PiperOrigin-RevId: 758098495
2025-05-13 01:06:41 -07:00
Jan Wassenberg
a0ff98ea60
Entirely remove constexpr on PaddedDirEnd. Refs #551
...
Apparently GCC 9.4 does not handle HWY_CXX17_CONSTEXPR as we intend.
PiperOrigin-RevId: 755967709
2025-05-07 12:48:19 -07:00
Jan Wassenberg
e9ecb7794d
Fix gcc build error and gemma3 crash, thanks @ufownl, fixes #551
...
PiperOrigin-RevId: 755729478
2025-05-07 00:59:18 -07:00
Jan Wassenberg
c8d92948f4
Move fields, io* and blob* from compression/ into io/
...
PiperOrigin-RevId: 755445712
2025-05-06 11:17:19 -07:00