gemma.cpp

History

Jan Wassenberg 0f70f285e0 1.1x prefill and decode speedup (attention/activations) Optimizations - Better load-balancing in attention threading (Previously, clusters were limited by #heads) - Add MulByConstTo to avoid zero-init - Parallel activations Cleanup - Prepare for RowPtr in A or B - Pass through thread_id to ops - Avoid warning in bench_matmul PiperOrigin-RevId: 773723423		2025-06-20 08:59:53 -07:00
..
allocator.cc	Cleanup: remove unused kCyclic, remove 2 suffix	2025-05-13 01:06:41 -07:00
allocator.h	1.31x batch prefill, 1.24x batch decode speedup: NUMA binding	2025-05-16 07:42:13 -07:00
args.h	Move fields, io* and blob* from compression/ into io/	2025-05-06 11:17:19 -07:00
basics.h	Minor cleanup: enable 0,0 Extents2D, add SerializedSpan typedef, include fixes	2025-04-08 03:35:55 -07:00
mat.cc	Split gemma-inl into separate source files	2025-06-05 05:36:44 -07:00
mat.h	1.1x prefill and decode speedup (attention/activations)	2025-06-20 08:59:53 -07:00
test_util.h	Minor cleanup/fixes:	2024-09-09 06:58:09 -07:00
threading.cc	Fix thread name when skipping packages/clusters	2025-06-01 23:50:11 -07:00
threading.h	1.1x prefill and decode speedup (attention/activations)	2025-06-20 08:59:53 -07:00
threading_context.cc	Rename-only: remove Allocator2 etc suffixes now that refactoring is complete	2025-05-06 09:12:43 -07:00
threading_context.h	Batch inference fixes: set pos during prefill, fix assert	2025-06-17 07:09:44 -07:00
threading_test.cc	Rename-only: remove Allocator2 etc suffixes now that refactoring is complete	2025-05-06 09:12:43 -07:00
topology.cc	Fix thread name when skipping packages/clusters	2025-06-01 23:50:11 -07:00
topology.h	Fix thread name when skipping packages/clusters	2025-06-01 23:50:11 -07:00