(1) A function GenerateTWithContinuousBatching is added to use continuous batching when enabled.
(2) The ContinuousQBatch is added as a subclass of QBatch to manage prefill, insert, used-kv-cache-collection.
(3) Also expanded the unit test to more diverse cases.
PiperOrigin-RevId: 836090261
Uses the key_norm and query_norm layers to disambiguate between the Gemma2-2B and Gemma3-1B models.
Since Gemma3-1B is not multimodal, ViT is not an effective disambiguator. KQ normalization is a structural disambiguator between gemma2 and gemma3.
PiperOrigin-RevId: 833213331
This could lead to stack overflow in B_storage.
Also do not require specific type for query_norm_scale,
update batch sizes for attention tensors,
more verbose Mat shape/type checks.
PiperOrigin-RevId: 824987689
* Adds and uses a new `AttentionActivationPtrs` that holds non-owning `MatPtrs`. Acts as a view into `AttentionActivations`.
* Updates `QBatch` to hold non-owning `MatPtr`s to the kv caches.
* Enables the `MatPtrT` default constructor for simpler initializations.
* Pulls out and passes `LayerWeightsPtrs::query_norm_scale` directly. While `LayerWeightsPtrs` already held non-owning `MatPtr`s, this change avoids the need to find and construct several empty weight tensors just to construct one `query_norm_scale` tensor.
PiperOrigin-RevId: 824584177
Group M=4..7 into same config. Add configs for power of two sizes.
Allow odd mc to enable a single range for odd M.
io.cc: warning fix(cast).
IsBlock -> !IsOneMC
benchmark_helper: best for verbosity 3, all configs for 4
ops_test: remove unused includes
PiperOrigin-RevId: 824475104
Pass ThreadingContext instead of Pools/Profiler individually, for access to Zones
Add GCPP_ZONE helper
Add Caller argument to pool.Run to enable new stats
Remove most direct dependencies on ThreadPool, prefer ParallelFor
PiperOrigin-RevId: 822934530
Updates the FileSize() calls in BlobWriter to instead use a computed offset.
FileSize() may not work with all implementations of File which can cause issues
while writing.
PiperOrigin-RevId: 822646338
Adding the following cache variable in the CMakePresets.json to enforce modern policies automatically
This ensures all developers can run cmake --preset windows without hitting legacy compatibility or deprecation issues.