gemma.cpp/util
Jan Wassenberg 0f70f285e0 1.1x prefill and decode speedup (attention/activations)
Optimizations
- Better load-balancing in attention threading
(Previously, clusters were limited by #heads)
- Add MulByConstTo to avoid zero-init
- Parallel activations

Cleanup
- Prepare for RowPtr in A or B
- Pass through thread_id to ops
- Avoid warning in bench_matmul

PiperOrigin-RevId: 773723423
2025-06-20 08:59:53 -07:00
..
allocator.cc Cleanup: remove unused kCyclic, remove 2 suffix 2025-05-13 01:06:41 -07:00
allocator.h 1.31x batch prefill, 1.24x batch decode speedup: NUMA binding 2025-05-16 07:42:13 -07:00
args.h Move fields, io* and blob* from compression/ into io/ 2025-05-06 11:17:19 -07:00
basics.h Minor cleanup: enable 0,0 Extents2D, add SerializedSpan typedef, include fixes 2025-04-08 03:35:55 -07:00
mat.cc Split gemma-inl into separate source files 2025-06-05 05:36:44 -07:00
mat.h 1.1x prefill and decode speedup (attention/activations) 2025-06-20 08:59:53 -07:00
test_util.h Minor cleanup/fixes: 2024-09-09 06:58:09 -07:00
threading.cc Fix thread name when skipping packages/clusters 2025-06-01 23:50:11 -07:00
threading.h 1.1x prefill and decode speedup (attention/activations) 2025-06-20 08:59:53 -07:00
threading_context.cc Rename-only: remove Allocator2 etc suffixes now that refactoring is complete 2025-05-06 09:12:43 -07:00
threading_context.h Batch inference fixes: set pos during prefill, fix assert 2025-06-17 07:09:44 -07:00
threading_test.cc Rename-only: remove Allocator2 etc suffixes now that refactoring is complete 2025-05-06 09:12:43 -07:00
topology.cc Fix thread name when skipping packages/clusters 2025-06-01 23:50:11 -07:00
topology.h Fix thread name when skipping packages/clusters 2025-06-01 23:50:11 -07:00