Krzysztof Rymski
539d9bb8e7
Change to use faster exponent function
...
PiperOrigin-RevId: 877981568
2026-03-03 09:16:04 -08:00
Ray Smith
49cb438b1e
Rollback of erroneous rollback.
...
PiperOrigin-RevId: 877376165
2026-03-02 06:50:26 -08:00
The gemma.cpp Authors
a3d994915f
No public description
...
PiperOrigin-RevId: 877333188
2026-03-02 04:32:29 -08:00
Ray Smith
16c1b29b89
Rewrote flash attention to use BF16, transpose k and v, rewrote the task distribution, increase parallelism on decode, and use double the registers for the core of flash attention.
...
PiperOrigin-RevId: 877308306
2026-03-02 03:11:01 -08:00
Nikhil Dev Goyal
dd268ddbe8
Add FastGelu activation function in a newly created created fast_ops-inl.h files.
...
This replaces the Tanh call with FastTanh call in the Gelu function written in math-inl.h.
PiperOrigin-RevId: 876339830
2026-02-27 11:14:47 -08:00
Krzysztof Rymski
df162ead7c
Implementation of tiled attention with bf16 and circular buffers which reduces memory requirements by 4x on longer context on gemma models.
...
It also supports better parallelism for small batch sizes / small models.
It also is able to utilize VDPBF16PS for nice 2x improvement on avx512
PiperOrigin-RevId: 874517319
2026-02-24 03:26:49 -08:00
Ray Smith
76d7951242
Added wheat_from_chaff_test to test the ability of a model to find a needle in a haystack of data.
...
Replaced flag with attention_impl to control which attention to run.
PiperOrigin-RevId: 869694868
2026-02-13 06:05:30 -08:00
Krzysztof Rymski
16a7ba2d6e
Internal changes
...
PiperOrigin-RevId: 854171429
2026-01-09 06:35:36 -08:00
Krzysztof Rymski
2ee1fac74c
Internal changes
...
PiperOrigin-RevId: 853138600
2026-01-07 01:21:37 -08:00
Krzysztof Rymski
b73a9ede8f
Internal changes
...
PiperOrigin-RevId: 846648337
2025-12-19 02:46:18 -08:00
Krzysztof Rymski
6661d3a60c
Internal changes
...
PiperOrigin-RevId: 846140314
2025-12-18 01:26:43 -08:00
Liam Miller-Cushon
142e6a7e9c
No public description
...
PiperOrigin-RevId: 846030124
2025-12-17 20:10:54 -08:00
Balazs Racz
596bdfe5af
Separate monolithic gemma_lib library into more specific cc_library targets.
...
Creates new cc_library targets for :attention, :tensor_stats and :activations. Eliminates cyclic dependencies between these libraries.
PiperOrigin-RevId: 845690136
2025-12-17 03:31:16 -08:00
Balazs Racz
baa69dfb78
Makes the entire runtime_config passed into the activations constructor.
...
PiperOrigin-RevId: 845153671
2025-12-16 01:56:52 -08:00
Krzysztof Rymski
44dfd69b9b
Internal changes
...
PiperOrigin-RevId: 844759322
2025-12-15 07:14:37 -08:00
Jan Wassenberg
0c64987a96
Abort if args are unrecognized, refactor argument passing
...
This catches typos/incorrect usage.
Refactor: group Loader/Threading/Inference into GemmaArgs.
All *Args ctors now have an extra ConsumedArgs& argument.
PiperOrigin-RevId: 844690553
2025-12-15 03:18:45 -08:00
Balazs Racz
338cd8a36e
Factors out a new cc_library `:query` from `:gemma-lib`.
...
Moves query-related structs/classes to gemma/query.h.
This refactors PerQuery, AllQueries, and QBatch into a dedicated header file, gemma/query.h, and updates BUILD dependencies accordingly.
PiperOrigin-RevId: 843604293
2025-12-12 02:53:56 -08:00
Jan Wassenberg
73c3627b67
Add tensor stats and output
...
tensor_info: add missing header
io: fix mode
weights.h: add layer_idx to LayerWeightsPtrs
PiperOrigin-RevId: 843531051
2025-12-11 22:52:46 -08:00
Krzysztof Rymski
64178ace38
Internal changes
...
PiperOrigin-RevId: 842727112
2025-12-10 07:55:17 -08:00
Martin Stolle
1014ae9e2a
Adding a simple test for GemmaAttention
...
PiperOrigin-RevId: 842135414
2025-12-09 02:13:03 -08:00
Jan Wassenberg
5a6895c609
Avoid warning when OS affinity limits us to the second socket
...
Also simplify NumSMT, detect from .smt field directly
PiperOrigin-RevId: 841749486
2025-12-08 07:10:43 -08:00
Martin Stolle
9348048885
Clean up toPtrs to delegate to toPtr
...
PiperOrigin-RevId: 840214969
2025-12-04 06:22:04 -08:00
Krzysztof Rymski
2b4436beb6
Internal changes
...
PiperOrigin-RevId: 840151004
2025-12-04 02:37:53 -08:00
Jan Wassenberg
ccb49bc82f
Add ToFloatSlow, move RandomFloat to test_util
...
PiperOrigin-RevId: 837412290
2025-11-27 00:14:51 -08:00
Krzysztof Rymski
c153d5255b
Internal changes
...
PiperOrigin-RevId: 837001762
2025-11-26 01:05:35 -08:00
Martin Stolle
35e9f9f05f
Introduce attention implementation configurability.
...
PiperOrigin-RevId: 828971705
2025-11-06 08:43:41 -08:00
Ray Smith
8a100c1e8d
Added access to flash attention internals to TileFlashAttention4
...
PiperOrigin-RevId: 826011137
2025-10-30 06:50:05 -07:00
Biruk Mammo
5a05857deb
[Gemma.cpp] Allows non-owned arguments for attention methods.
...
* Adds and uses a new `AttentionActivationPtrs` that holds non-owning `MatPtrs`. Acts as a view into `AttentionActivations`.
* Updates `QBatch` to hold non-owning `MatPtr`s to the kv caches.
* Enables the `MatPtrT` default constructor for simpler initializations.
* Pulls out and passes `LayerWeightsPtrs::query_norm_scale` directly. While `LayerWeightsPtrs` already held non-owning `MatPtr`s, this change avoids the need to find and construct several empty weight tensors just to construct one `query_norm_scale` tensor.
PiperOrigin-RevId: 824584177
2025-10-27 10:43:25 -07:00
Jan Wassenberg
3ed403e287
Major cleanup of profiler zones, add Caller annotation for all pool.Run
...
Pass ThreadingContext instead of Pools/Profiler individually, for access to Zones
Add GCPP_ZONE helper
Add Caller argument to pool.Run to enable new stats
Remove most direct dependencies on ThreadPool, prefer ParallelFor
PiperOrigin-RevId: 822934530
2025-10-23 01:54:24 -07:00
Phil Culliton
503aaddd65
Add 8-bit integer quantization (I8Stream) to Gemma.cpp.
...
PiperOrigin-RevId: 819787856
2025-10-15 09:25:20 -07:00
Ray Smith
fb6fa793f4
Added a global (to gemma) zones list to enable most call sites to PROFILER_ZONE3 to avoid the sychronization required for the static const initialization of the zone handle.
...
Improved flash_attention to enable profiling using the new zones.
PiperOrigin-RevId: 819235421
2025-10-14 08:30:58 -07:00
Jan Wassenberg
035273c184
tune pool kSpin mode in threading_context
...
Previously, this happened concurrently with the matmul autotune, which could lead to incorrect outcomes.
threading: de-singleton Pinning (no longer stores affinity); pass PoolWorkerMapping; fix Pool dtor order
Also enable SPR target (Zen4 is AMD-only),
update Highway version for renamed Thread()->GlobalIdx().
PiperOrigin-RevId: 816223017
2025-10-07 08:36:26 -07:00
Ray Smith
2f6cbde8ff
Added a smaller tile size to flash attention for smaller batch sizes
...
PiperOrigin-RevId: 813226193
2025-09-30 05:49:20 -07:00
Jan Wassenberg
fac8aac4cb
Internal change
...
PiperOrigin-RevId: 809975026
2025-09-22 05:37:03 -07:00
Jan Wassenberg
501fdf000e
Remove no longer used MatVec
...
PiperOrigin-RevId: 809059409
2025-09-19 09:03:22 -07:00
Ray Smith
f10ac41a20
Added flash attention, with both a single-q function, and a register-tiled function.
...
The register-tiled version achieves a speed-up by a factor of about 9.7 over the previous attention function on an AVX3-enabled machine.
PiperOrigin-RevId: 804913784
2025-09-09 08:05:26 -07:00
Jan Wassenberg
461a9c7d1b
Matmul refactoring towards fusion
...
MMLoops: move dispatch code out, use overloads
split build target into matmul_env (for MatMulEnv/MMOptions)
weights: no longer call BindB
Fix potential out of bounds in gemma_batch_bench
PiperOrigin-RevId: 804895985
2025-09-09 07:13:38 -07:00
Jan Wassenberg
2b4c16e243
Remove Griffin support
...
Also add IsObsolete helper
PiperOrigin-RevId: 803376921
2025-09-05 02:35:40 -07:00
Jan Wassenberg
5d1693e806
Internal change
...
PiperOrigin-RevId: 803083229
2025-09-04 10:31:20 -07:00
Jan Wassenberg
afd82376a5
Add AES-CTR RNG for parallel sampling (not yet used)
...
PiperOrigin-RevId: 802991142
2025-09-04 05:58:42 -07:00
Jan Wassenberg
1e3c853e80
Add ParallelFor wrapper function and one new mode
...
Move ParallelismType from matmul.h to threading.h
Replace SmallParallelFor with ParallelFor and the new mode
PiperOrigin-RevId: 802038452
2025-09-02 01:40:09 -07:00
Jan Wassenberg
229bd078a1
1.29x speedup: bf16 C1/C2. Extend most ops to any type, expand test coverage.
...
Also increase dot_test.cc range for Zen4, and matmul_test tolerance (failing in some configs)
PiperOrigin-RevId: 801789922
2025-09-01 06:34:04 -07:00
Jan Wassenberg
5411fd846d
Minor: batched NotifyGenerate, fix comment/dep
...
PiperOrigin-RevId: 799889802
2025-08-26 23:33:17 -07:00
Jan Wassenberg
faa4102992
(Resubmit) Prepare profiler annotations for new API
...
Pass hwy::Profiler& to low-level functions.
Used ThreadingContext arg instead of NestedPools.
Use new PROFILER_ZONE3.
PiperOrigin-RevId: 794461159
2025-08-13 01:38:24 -07:00
The gemma.cpp Authors
a2d9133f7d
Prepare profiler annotations for new API
...
Pass hwy::Profiler& to low-level functions.
Used ThreadingContext arg instead of NestedPools.
Use new PROFILER_ZONE3.
PiperOrigin-RevId: 793865287
2025-08-11 17:51:38 -07:00
Jan Wassenberg
4cbf63e6f0
Prepare profiler annotations for new API
...
Pass hwy::Profiler& to low-level functions.
Used ThreadingContext arg instead of NestedPools.
Use new PROFILER_ZONE3.
PiperOrigin-RevId: 793821255
2025-08-11 15:34:52 -07:00
Ivo Ristovski List
b56b2f05e4
Automated Code Change
...
PiperOrigin-RevId: 789876258
2025-08-01 13:29:50 -07:00
Jan Wassenberg
799c264df3
Pre-tune thread pool before matmul
...
Also improve profiler annotations - remove near-zero ones and add more for startup
PiperOrigin-RevId: 789352414
2025-07-31 08:45:26 -07:00
Jan Wassenberg
d1638587f0
1.14x batch decode speedup: parallelize RMSNorm ops
...
Activations was over-parallelized, use single pool instead.
Also improve profiler zone annotations,
pass through worker args (for tracking concurrency), now non-optional.
PiperOrigin-RevId: 788790976
2025-07-30 00:55:45 -07:00
Jan Wassenberg
e76e29ce11
De-singleton ThreadingContext so callers can pass in their own
...
weights.cc: fix BindB argument for bf16 tensors
threading_test: enable autotune
PiperOrigin-RevId: 785763618
2025-07-22 02:08:46 -07:00