* Adds and uses a new `AttentionActivationPtrs` that holds non-owning `MatPtrs`. Acts as a view into `AttentionActivations`.
* Updates `QBatch` to hold non-owning `MatPtr`s to the kv caches.
* Enables the `MatPtrT` default constructor for simpler initializations.
* Pulls out and passes `LayerWeightsPtrs::query_norm_scale` directly. While `LayerWeightsPtrs` already held non-owning `MatPtr`s, this change avoids the need to find and construct several empty weight tensors just to construct one `query_norm_scale` tensor.
PiperOrigin-RevId: 824584177
Group M=4..7 into same config. Add configs for power of two sizes.
Allow odd mc to enable a single range for odd M.
io.cc: warning fix(cast).
IsBlock -> !IsOneMC
benchmark_helper: best for verbosity 3, all configs for 4
ops_test: remove unused includes
PiperOrigin-RevId: 824475104
Pass ThreadingContext instead of Pools/Profiler individually, for access to Zones
Add GCPP_ZONE helper
Add Caller argument to pool.Run to enable new stats
Remove most direct dependencies on ThreadPool, prefer ParallelFor
PiperOrigin-RevId: 822934530
Updates the FileSize() calls in BlobWriter to instead use a computed offset.
FileSize() may not work with all implementations of File which can cause issues
while writing.
PiperOrigin-RevId: 822646338
Adding the following cache variable in the CMakePresets.json to enforce modern policies automatically
This ensures all developers can run cmake --preset windows without hitting legacy compatibility or deprecation issues.
Previously, this happened concurrently with the matmul autotune, which could lead to incorrect outcomes.
threading: de-singleton Pinning (no longer stores affinity); pass PoolWorkerMapping; fix Pool dtor order
Also enable SPR target (Zen4 is AMD-only),
update Highway version for renamed Thread()->GlobalIdx().
PiperOrigin-RevId: 816223017
matmul-inl: support CView=StridedView or RowPtrs; rename to C_MC_NC
matmul.cc: Allow 1 more rep for MC/NC to allow half-sized tiles, which helps.
PiperOrigin-RevId: 807291701
The register-tiled version achieves a speed-up by a factor of about 9.7 over the previous attention function on an AVX3-enabled machine.
PiperOrigin-RevId: 804913784
MMLoops: move dispatch code out, use overloads
split build target into matmul_env (for MatMulEnv/MMOptions)
weights: no longer call BindB
Fix potential out of bounds in gemma_batch_bench
PiperOrigin-RevId: 804895985
Lift DecompressA out of main autotuner to prevent interference
Also use kMaxNR / kNR constants instead of extra args
Fix: only require vector alignment, not cache alignment
PiperOrigin-RevId: 804333769