gemma.cpp

History

Jan Wassenberg d1638587f0 1.14x batch decode speedup: parallelize RMSNorm ops Activations was over-parallelized, use single pool instead. Also improve profiler zone annotations, pass through worker args (for tracking concurrency), now non-optional. PiperOrigin-RevId: 788790976		2025-07-30 00:55:45 -07:00
..
bench_matmul.cc	De-singleton ThreadingContext so callers can pass in their own	2025-07-22 02:08:46 -07:00
dot-inl.h	Huge refactor of weight handling and model loading.	2025-05-06 04:44:21 -07:00
dot_test.cc	De-singleton ThreadingContext so callers can pass in their own	2025-07-22 02:08:46 -07:00
fp_arith-inl.h	Decouple MatMul from gemma-inl: precompile for all input types	2025-05-27 07:08:58 -07:00
gemma_matvec_test.cc	De-singleton ThreadingContext so callers can pass in their own	2025-07-22 02:08:46 -07:00
matmul-inl.h	1.14x batch decode speedup: parallelize RMSNorm ops	2025-07-30 00:55:45 -07:00
matmul.cc	1.1x prefill and decode speedup (attention/activations)	2025-06-20 08:59:53 -07:00
matmul.h	De-singleton ThreadingContext so callers can pass in their own	2025-07-22 02:08:46 -07:00
matmul_static-inl.h	1.16x decode speedup: remove last MatVec in Attention	2025-06-02 09:40:29 -07:00
matmul_static.h	1.16x decode speedup: remove last MatVec in Attention	2025-06-02 09:40:29 -07:00
matmul_static_bf16.cc	Speed up builds by skipping rarely used targets	2025-06-17 05:44:20 -07:00
matmul_static_f32.cc	Speed up builds by skipping rarely used targets	2025-06-17 05:44:20 -07:00
matmul_static_nuq.cc	Speed up builds by skipping rarely used targets	2025-06-17 05:44:20 -07:00
matmul_static_sfp.cc	Speed up builds by skipping rarely used targets	2025-06-17 05:44:20 -07:00
matmul_test.cc	De-singleton ThreadingContext so callers can pass in their own	2025-07-22 02:08:46 -07:00
matvec-inl.h	Replace last ConstMat with MatPtr	2025-05-13 10:55:22 -07:00
ops-inl.h	1.14x batch decode speedup: parallelize RMSNorm ops	2025-07-30 00:55:45 -07:00
ops.h	De-singleton ThreadingContext so callers can pass in their own	2025-07-22 02:08:46 -07:00
ops_test.cc	1.14x batch decode speedup: parallelize RMSNorm ops	2025-07-30 00:55:45 -07:00
sum-inl.h	Minor cleanup, Windows+Bazel build fixes	2024-10-10 09:05:06 -07:00