| .. |
|
bench_matmul.cc
|
MatMul simplification, threading strategy improvements
|
2025-09-03 21:45:07 -07:00 |
|
dot-inl.h
|
f32 LoopKC: 1.37x(M=512), 1.19(M=128) single-K F32,BF16 matmul speedup on SKX
|
2025-08-28 08:55:50 -07:00 |
|
dot_test.cc
|
Further adjust dot_test threshold (numerics)
|
2025-09-05 05:50:16 -07:00 |
|
fp_arith-inl.h
|
Decouple MatMul from gemma-inl: precompile for all input types
|
2025-05-27 07:08:58 -07:00 |
|
matmul-inl.h
|
1.03x speedup: fused FFN
|
2025-09-15 10:26:37 -07:00 |
|
matmul.cc
|
1.03x speedup: fused FFN
|
2025-09-15 10:26:37 -07:00 |
|
matmul.h
|
1.03x speedup: fused FFN
|
2025-09-15 10:26:37 -07:00 |
|
matmul_static-inl.h
|
1.03x speedup: fused FFN
|
2025-09-15 10:26:37 -07:00 |
|
matmul_static.h
|
1.03x speedup: fused FFN
|
2025-09-15 10:26:37 -07:00 |
|
matmul_static_bf16.cc
|
Speed up builds by skipping rarely used targets
|
2025-06-17 05:44:20 -07:00 |
|
matmul_static_f32.cc
|
Speed up builds by skipping rarely used targets
|
2025-06-17 05:44:20 -07:00 |
|
matmul_static_nuq.cc
|
Speed up builds by skipping rarely used targets
|
2025-06-17 05:44:20 -07:00 |
|
matmul_static_sfp.cc
|
Speed up builds by skipping rarely used targets
|
2025-06-17 05:44:20 -07:00 |
|
matmul_test.cc
|
1.03x speedup: fused FFN
|
2025-09-15 10:26:37 -07:00 |
|
ops-inl.h
|
Used hn::BroadcastLane instead of Set(..., x.raw)
|
2025-09-25 09:42:03 -07:00 |
|
ops.h
|
De-singleton ThreadingContext so callers can pass in their own
|
2025-07-22 02:08:46 -07:00 |
|
ops_test.cc
|
Replace mt19937 with new generator to enable parallel sampling
|
2025-09-04 23:49:10 -07:00 |
|
sum-inl.h
|
Minor cleanup, Windows+Bazel build fixes
|
2024-10-10 09:05:06 -07:00 |