| .. |
|
activations.h
|
Major MatMul update, 1.9-2.3x speedup on Zen4 via bf16 mul
|
2024-08-16 07:52:20 -07:00 |
|
backward-inl.h
|
Avoid duplication of RMSNorm, support all activation/weight types
|
2024-08-28 01:26:55 -07:00 |
|
backward.cc
|
Major MatMul update, 1.9-2.3x speedup on Zen4 via bf16 mul
|
2024-08-16 07:52:20 -07:00 |
|
backward.h
|
1.03-1.08x decode speedup: precompute Rope theta, fuse
|
2024-08-09 01:23:24 -07:00 |
|
backward_scalar.h
|
Move benchmark_helper to evals/, weights_raw to compression/.
|
2024-07-08 01:13:23 -07:00 |
|
backward_scalar_test.cc
|
Refactor configurables.
|
2024-07-10 21:30:58 -07:00 |
|
backward_test.cc
|
1.03-1.08x decode speedup: precompute Rope theta, fuse
|
2024-08-09 01:23:24 -07:00 |
|
common_scalar.h
|
Merge pull request #212 from szabadka:adam2
|
2024-06-07 02:25:18 -07:00 |
|
forward-inl.h
|
1.03-1.08x decode speedup: precompute Rope theta, fuse
|
2024-08-09 01:23:24 -07:00 |
|
forward.cc
|
1.03-1.08x decode speedup: precompute Rope theta, fuse
|
2024-08-09 01:23:24 -07:00 |
|
forward.h
|
1.03-1.08x decode speedup: precompute Rope theta, fuse
|
2024-08-09 01:23:24 -07:00 |
|
forward_scalar.h
|
Move benchmark_helper to evals/, weights_raw to compression/.
|
2024-07-08 01:13:23 -07:00 |
|
optimize_test.cc
|
Vectorize Rope for qkv dim not evenly divisible by number of lanes.
|
2024-08-21 02:22:22 -07:00 |
|
optimizer.cc
|
Use Loader/AppArgs to construct gemma_test model, simplify AcceptFunc
|
2024-06-11 05:53:10 -07:00 |
|
optimizer.h
|
Major MatMul update, 1.9-2.3x speedup on Zen4 via bf16 mul
|
2024-08-16 07:52:20 -07:00 |
|
prompt.h
|
Add missing include
|
2024-06-04 10:29:12 +00:00 |
|
sampler.h
|
Add config for att/final cap, skip max-subtract. Fixes #278
|
2024-07-01 09:45:26 -07:00 |
|
test_util.h
|
Move benchmark_helper to evals/, weights_raw to compression/.
|
2024-07-08 01:13:23 -07:00 |