Balazs Racz
338cd8a36e
Factors out a new cc_library `:query` from `:gemma-lib`.
...
Moves query-related structs/classes to gemma/query.h.
This refactors PerQuery, AllQueries, and QBatch into a dedicated header file, gemma/query.h, and updates BUILD dependencies accordingly.
PiperOrigin-RevId: 843604293
2025-12-12 02:53:56 -08:00
Jan Wassenberg
73c3627b67
Add tensor stats and output
...
tensor_info: add missing header
io: fix mode
weights.h: add layer_idx to LayerWeightsPtrs
PiperOrigin-RevId: 843531051
2025-12-11 22:52:46 -08:00
Krzysztof Rymski
64178ace38
Internal changes
...
PiperOrigin-RevId: 842727112
2025-12-10 07:55:17 -08:00
Martin Stolle
1014ae9e2a
Adding a simple test for GemmaAttention
...
PiperOrigin-RevId: 842135414
2025-12-09 02:13:03 -08:00
Jan Wassenberg
5a6895c609
Avoid warning when OS affinity limits us to the second socket
...
Also simplify NumSMT, detect from .smt field directly
PiperOrigin-RevId: 841749486
2025-12-08 07:10:43 -08:00
Martin Stolle
9348048885
Clean up toPtrs to delegate to toPtr
...
PiperOrigin-RevId: 840214969
2025-12-04 06:22:04 -08:00
Krzysztof Rymski
2b4436beb6
Internal changes
...
PiperOrigin-RevId: 840151004
2025-12-04 02:37:53 -08:00
Jan Wassenberg
ccb49bc82f
Add ToFloatSlow, move RandomFloat to test_util
...
PiperOrigin-RevId: 837412290
2025-11-27 00:14:51 -08:00
Krzysztof Rymski
c153d5255b
Internal changes
...
PiperOrigin-RevId: 837001762
2025-11-26 01:05:35 -08:00
Martin Stolle
35e9f9f05f
Introduce attention implementation configurability.
...
PiperOrigin-RevId: 828971705
2025-11-06 08:43:41 -08:00
Ray Smith
8a100c1e8d
Added access to flash attention internals to TileFlashAttention4
...
PiperOrigin-RevId: 826011137
2025-10-30 06:50:05 -07:00
Biruk Mammo
5a05857deb
[Gemma.cpp] Allows non-owned arguments for attention methods.
...
* Adds and uses a new `AttentionActivationPtrs` that holds non-owning `MatPtrs`. Acts as a view into `AttentionActivations`.
* Updates `QBatch` to hold non-owning `MatPtr`s to the kv caches.
* Enables the `MatPtrT` default constructor for simpler initializations.
* Pulls out and passes `LayerWeightsPtrs::query_norm_scale` directly. While `LayerWeightsPtrs` already held non-owning `MatPtr`s, this change avoids the need to find and construct several empty weight tensors just to construct one `query_norm_scale` tensor.
PiperOrigin-RevId: 824584177
2025-10-27 10:43:25 -07:00
Jan Wassenberg
3ed403e287
Major cleanup of profiler zones, add Caller annotation for all pool.Run
...
Pass ThreadingContext instead of Pools/Profiler individually, for access to Zones
Add GCPP_ZONE helper
Add Caller argument to pool.Run to enable new stats
Remove most direct dependencies on ThreadPool, prefer ParallelFor
PiperOrigin-RevId: 822934530
2025-10-23 01:54:24 -07:00
Phil Culliton
503aaddd65
Add 8-bit integer quantization (I8Stream) to Gemma.cpp.
...
PiperOrigin-RevId: 819787856
2025-10-15 09:25:20 -07:00
Ray Smith
fb6fa793f4
Added a global (to gemma) zones list to enable most call sites to PROFILER_ZONE3 to avoid the sychronization required for the static const initialization of the zone handle.
...
Improved flash_attention to enable profiling using the new zones.
PiperOrigin-RevId: 819235421
2025-10-14 08:30:58 -07:00
Jan Wassenberg
035273c184
tune pool kSpin mode in threading_context
...
Previously, this happened concurrently with the matmul autotune, which could lead to incorrect outcomes.
threading: de-singleton Pinning (no longer stores affinity); pass PoolWorkerMapping; fix Pool dtor order
Also enable SPR target (Zen4 is AMD-only),
update Highway version for renamed Thread()->GlobalIdx().
PiperOrigin-RevId: 816223017
2025-10-07 08:36:26 -07:00
Ray Smith
2f6cbde8ff
Added a smaller tile size to flash attention for smaller batch sizes
...
PiperOrigin-RevId: 813226193
2025-09-30 05:49:20 -07:00
Jan Wassenberg
fac8aac4cb
Internal change
...
PiperOrigin-RevId: 809975026
2025-09-22 05:37:03 -07:00
Jan Wassenberg
501fdf000e
Remove no longer used MatVec
...
PiperOrigin-RevId: 809059409
2025-09-19 09:03:22 -07:00
Ray Smith
f10ac41a20
Added flash attention, with both a single-q function, and a register-tiled function.
...
The register-tiled version achieves a speed-up by a factor of about 9.7 over the previous attention function on an AVX3-enabled machine.
PiperOrigin-RevId: 804913784
2025-09-09 08:05:26 -07:00
Jan Wassenberg
461a9c7d1b
Matmul refactoring towards fusion
...
MMLoops: move dispatch code out, use overloads
split build target into matmul_env (for MatMulEnv/MMOptions)
weights: no longer call BindB
Fix potential out of bounds in gemma_batch_bench
PiperOrigin-RevId: 804895985
2025-09-09 07:13:38 -07:00
Jan Wassenberg
2b4c16e243
Remove Griffin support
...
Also add IsObsolete helper
PiperOrigin-RevId: 803376921
2025-09-05 02:35:40 -07:00
Jan Wassenberg
5d1693e806
Internal change
...
PiperOrigin-RevId: 803083229
2025-09-04 10:31:20 -07:00
Jan Wassenberg
afd82376a5
Add AES-CTR RNG for parallel sampling (not yet used)
...
PiperOrigin-RevId: 802991142
2025-09-04 05:58:42 -07:00
Jan Wassenberg
1e3c853e80
Add ParallelFor wrapper function and one new mode
...
Move ParallelismType from matmul.h to threading.h
Replace SmallParallelFor with ParallelFor and the new mode
PiperOrigin-RevId: 802038452
2025-09-02 01:40:09 -07:00
Jan Wassenberg
229bd078a1
1.29x speedup: bf16 C1/C2. Extend most ops to any type, expand test coverage.
...
Also increase dot_test.cc range for Zen4, and matmul_test tolerance (failing in some configs)
PiperOrigin-RevId: 801789922
2025-09-01 06:34:04 -07:00
Jan Wassenberg
5411fd846d
Minor: batched NotifyGenerate, fix comment/dep
...
PiperOrigin-RevId: 799889802
2025-08-26 23:33:17 -07:00
Jan Wassenberg
faa4102992
(Resubmit) Prepare profiler annotations for new API
...
Pass hwy::Profiler& to low-level functions.
Used ThreadingContext arg instead of NestedPools.
Use new PROFILER_ZONE3.
PiperOrigin-RevId: 794461159
2025-08-13 01:38:24 -07:00
The gemma.cpp Authors
a2d9133f7d
Prepare profiler annotations for new API
...
Pass hwy::Profiler& to low-level functions.
Used ThreadingContext arg instead of NestedPools.
Use new PROFILER_ZONE3.
PiperOrigin-RevId: 793865287
2025-08-11 17:51:38 -07:00
Jan Wassenberg
4cbf63e6f0
Prepare profiler annotations for new API
...
Pass hwy::Profiler& to low-level functions.
Used ThreadingContext arg instead of NestedPools.
Use new PROFILER_ZONE3.
PiperOrigin-RevId: 793821255
2025-08-11 15:34:52 -07:00
Ivo Ristovski List
b56b2f05e4
Automated Code Change
...
PiperOrigin-RevId: 789876258
2025-08-01 13:29:50 -07:00
Jan Wassenberg
799c264df3
Pre-tune thread pool before matmul
...
Also improve profiler annotations - remove near-zero ones and add more for startup
PiperOrigin-RevId: 789352414
2025-07-31 08:45:26 -07:00
Jan Wassenberg
d1638587f0
1.14x batch decode speedup: parallelize RMSNorm ops
...
Activations was over-parallelized, use single pool instead.
Also improve profiler zone annotations,
pass through worker args (for tracking concurrency), now non-optional.
PiperOrigin-RevId: 788790976
2025-07-30 00:55:45 -07:00
Jan Wassenberg
e76e29ce11
De-singleton ThreadingContext so callers can pass in their own
...
weights.cc: fix BindB argument for bf16 tensors
threading_test: enable autotune
PiperOrigin-RevId: 785763618
2025-07-22 02:08:46 -07:00
Jan Wassenberg
56c9196eb6
Add blob_path to config deduction message
...
PiperOrigin-RevId: 782188689
2025-07-11 18:58:56 -07:00
Jan Wassenberg
a04cc287b2
Move MatMulEnv out of Gemma to enable concurrent calls
...
Also update benchmark_helper config print: add profiler, remove free mem
PiperOrigin-RevId: 774662974
2025-06-23 01:20:09 -07:00
Jan Wassenberg
834cbe5b39
linkstatic in most tests/binaries, remove fully_static_link
...
Also decrease "eternal" timeout to "long".
Add 2x/4x larger subsections of Frankenstein (from Gutenberg)
PiperOrigin-RevId: 773252901
2025-06-19 01:45:53 -07:00
Jan Wassenberg
343482c7ef
1.02x batch decode speedup: BF16 KV cache
...
ops-inl.h: Vectorize Rope(), template
Remove unused MulBy, and extra-arg overloads of MulByConst and Softmax
Fix for DecompressAndZeroPad: ensure second vector filled
PiperOrigin-RevId: 772779163
2025-06-17 23:21:59 -07:00
Jan Wassenberg
cd80d8b24d
Speed up builds by skipping rarely used targets
...
Centralize previous code into GEMMA_DISABLED_TARGETS
PiperOrigin-RevId: 772433723
2025-06-17 05:44:20 -07:00
Jan Wassenberg
6773e4517c
Split Activations into Griffin/Attention to reduce memory usage for attention-only tests.
...
PiperOrigin-RevId: 772025282
2025-06-16 07:52:59 -07:00
Jan Wassenberg
2c72ff2aa5
Fix MatMul issue caused by autotuning bucketing, refs #608 , thanks @ufownl
...
PiperOrigin-RevId: 771077158
2025-06-13 06:58:42 -07:00
Jan Wassenberg
c027a45a2e
MatPtr-ify KV, shared div_seq_len, --seq_len flag
...
PiperOrigin-RevId: 770194455
2025-06-11 09:49:38 -07:00
Daniel Keysers
d7b23d532a
Restructure internal initialization.
...
PiperOrigin-RevId: 769507096
2025-06-10 01:25:31 -07:00
Jan Wassenberg
6ee628ba38
Further cleanup: separate MatMulEnv arg
...
move row_ptrs into MatMulEnv
Consistent arg order: layer, activations, kv_cache, env
PiperOrigin-RevId: 767886386
2025-06-05 20:48:32 -07:00
Jan Wassenberg
3a266c662c
Split gemma-inl into separate source files
...
weights, mat: zero-initialize padding, required since the MatMul "avoid B decompress" optimization.
PiperOrigin-RevId: 767562313
2025-06-05 05:36:44 -07:00
Jan Wassenberg
9efdcfd45c
1.07x batch decode speedup: more BF16 weights and activations
...
BF16 att_sums and ffw_out
Support BF16 B views without decompression
Support arbitrary types in MulByConstAndAdd, AddFrom
Also update profiler annotations in ops-inl.h
PiperOrigin-RevId: 766995010
2025-06-03 23:30:18 -07:00
Jan Wassenberg
794a21a4e6
Major refactor to de-templatize gemma-inl and weights
...
This replaces per-weight instantiations of all code with only per-MatMul/norm.
Reduces binary size by 133KiB.
WeightsOwner is no longer required for type erasing, hence it is replaced with ModelWeightsPtrs.
Also remove unused EmbedToken, replaced with EmbedMMToken.
PiperOrigin-RevId: 766497657
2025-06-02 23:01:35 -07:00
Jan Wassenberg
c4a75abe43
Cleanup gemma_batch_bench
...
PiperOrigin-RevId: 766177406
2025-06-02 07:04:36 -07:00
Jan Wassenberg
3890eb5412
Remove backprop/
...
Also remove MatPtrT::Packed(); use PackedScale1 instead where const, or Row(0).
PiperOrigin-RevId: 764243198
2025-05-28 07:01:17 -07:00
Jan Wassenberg
627cc04db9
Decouple MatMul from gemma-inl: precompile for all input types
...
Call MatMulStatic instead of MatMul.
Also fix build error due to Highway's Lanes not being constexpr.
PiperOrigin-RevId: 763777269
2025-05-27 07:08:58 -07:00