gemma.cpp/ops
Apoorv Reddy 0e5b59d24d Implements FusedSoftmaxAndSampleTopK.
This computes softmax on the top-K logits, instead of computing softmax first and then getting top-K probs. So we end up avoiding renormalizing too. Additionally, modify softmax to do temperature scaling, if temp != 1.0

PiperOrigin-RevId: 727702149
2025-02-16 21:30:06 -08:00
..
bench_matmul.cc Infra improvements (2) 2025-01-23 01:55:19 -08:00
dot-inl.h Infra improvements (2) 2025-01-23 01:55:19 -08:00
dot_test.cc Infra improvements (2) 2025-01-23 01:55:19 -08:00
fp_arith-inl.h Cascaded summation for Softmax 2024-09-20 10:31:23 -07:00
gemma_matvec_test.cc Eliminated TConfig. 2024-10-17 05:04:22 -07:00
matmul-inl.h Threading/infra improvements. 2024-11-27 01:12:00 -08:00
matmul.h Infra improvements (2) 2025-01-23 01:55:19 -08:00
matmul_test.cc Infra improvements (2) 2025-01-23 01:55:19 -08:00
matmul_unit_test.cc Threading/infra improvements. 2024-11-27 01:12:00 -08:00
matvec-inl.h Infra improvements (2) 2025-01-23 01:55:19 -08:00
ops-inl.h Implements FusedSoftmaxAndSampleTopK. 2025-02-16 21:30:06 -08:00
ops.h Infra improvements (2) 2025-01-23 01:55:19 -08:00
ops_test.cc Apply PositionalEncodingQK always in-place. 2025-01-23 07:09:30 -08:00
sum-inl.h Minor cleanup, Windows+Bazel build fixes 2024-10-10 09:05:06 -07:00