Jan Wassenberg
2141d4788d
Add IsAppendOnly flag to file and if true, disable parallel writes
...
PiperOrigin-RevId: 788805810
2025-07-30 01:51:37 -07:00
Jan Wassenberg
e76e29ce11
De-singleton ThreadingContext so callers can pass in their own
...
weights.cc: fix BindB argument for bf16 tensors
threading_test: enable autotune
PiperOrigin-RevId: 785763618
2025-07-22 02:08:46 -07:00
Jan Wassenberg
56c9196eb6
Add blob_path to config deduction message
...
PiperOrigin-RevId: 782188689
2025-07-11 18:58:56 -07:00
Jan Wassenberg
a04cc287b2
Move MatMulEnv out of Gemma to enable concurrent calls
...
Also update benchmark_helper config print: add profiler, remove free mem
PiperOrigin-RevId: 774662974
2025-06-23 01:20:09 -07:00
Daniel Keysers
d7b23d532a
Restructure internal initialization.
...
PiperOrigin-RevId: 769507096
2025-06-10 01:25:31 -07:00
Jan Wassenberg
9efdcfd45c
1.07x batch decode speedup: more BF16 weights and activations
...
BF16 att_sums and ffw_out
Support BF16 B views without decompression
Support arbitrary types in MulByConstAndAdd, AddFrom
Also update profiler annotations in ops-inl.h
PiperOrigin-RevId: 766995010
2025-06-03 23:30:18 -07:00
Jan Wassenberg
794a21a4e6
Major refactor to de-templatize gemma-inl and weights
...
This replaces per-weight instantiations of all code with only per-MatMul/norm.
Reduces binary size by 133KiB.
WeightsOwner is no longer required for type erasing, hence it is replaced with ModelWeightsPtrs.
Also remove unused EmbedToken, replaced with EmbedMMToken.
PiperOrigin-RevId: 766497657
2025-06-02 23:01:35 -07:00
Jan Wassenberg
cf4d7ceb82
1.16x decode speedup: remove last MatVec in Attention
...
Precompute row pointers.
Remove no longer used MHA support; QStride -> qkv_dim.
Remove RowPtr from MatMul interface, use only MatPtrT.
Require opt-in define for NUQ to speed up builds.
Also fix io.cc on Windows.
PiperOrigin-RevId: 766228108
2025-06-02 09:40:29 -07:00
Jan Wassenberg
cb188d4a0e
Fix RowT issue and improve Griffin (currently still broken)
...
Use type-safe MatPtrT via dynamic_cast, avoid/remove unsafe RowT
activations: Griffin tensors are now padded
Griffin: add batching support, fix conv1d_cache allocation
weights: bundle to TensorToRead, add kNoPad flag, fix SplitW1
const-correct fix for ForEachTensor
blob_store: move BlobIO2 to .cc and rename BlobIO
PiperOrigin-RevId: 760610094
2025-05-19 07:02:10 -07:00
Jan Wassenberg
c443adee33
3.8x speedup of weights loading via preadv on Linux
...
Also move BlobReader reading functionality to weights.cc
PiperOrigin-RevId: 759240310
2025-05-15 11:55:15 -07:00
Jan Wassenberg
d538a6d6c6
Cleanup: remove unused kCyclic, remove 2 suffix
...
Also remove now unused allocator arg and fix warnings (cast, struct/class mismatch)
PiperOrigin-RevId: 758098495
2025-05-13 01:06:41 -07:00
Jan Wassenberg
a0ff98ea60
Entirely remove constexpr on PaddedDirEnd. Refs #551
...
Apparently GCC 9.4 does not handle HWY_CXX17_CONSTEXPR as we intend.
PiperOrigin-RevId: 755967709
2025-05-07 12:48:19 -07:00
Jan Wassenberg
e9ecb7794d
Fix gcc build error and gemma3 crash, thanks @ufownl, fixes #551
...
PiperOrigin-RevId: 755729478
2025-05-07 00:59:18 -07:00
Jan Wassenberg
c8d92948f4
Move fields, io* and blob* from compression/ into io/
...
PiperOrigin-RevId: 755445712
2025-05-06 11:17:19 -07:00