gemma.cpp

History

Apoorv Reddy 0e5b59d24d Implements FusedSoftmaxAndSampleTopK. This computes softmax on the top-K logits, instead of computing softmax first and then getting top-K probs. So we end up avoiding renormalizing too. Additionally, modify softmax to do temperature scaling, if temp != 1.0 PiperOrigin-RevId: 727702149		2025-02-16 21:30:06 -08:00
..
bench_matmul.cc	Infra improvements (2)	2025-01-23 01:55:19 -08:00
dot-inl.h	Infra improvements (2)	2025-01-23 01:55:19 -08:00
dot_test.cc	Infra improvements (2)	2025-01-23 01:55:19 -08:00
fp_arith-inl.h	Cascaded summation for Softmax	2024-09-20 10:31:23 -07:00
gemma_matvec_test.cc	Eliminated TConfig.	2024-10-17 05:04:22 -07:00
matmul-inl.h	Threading/infra improvements.	2024-11-27 01:12:00 -08:00
matmul.h	Infra improvements (2)	2025-01-23 01:55:19 -08:00
matmul_test.cc	Infra improvements (2)	2025-01-23 01:55:19 -08:00
matmul_unit_test.cc	Threading/infra improvements.	2024-11-27 01:12:00 -08:00
matvec-inl.h	Infra improvements (2)	2025-01-23 01:55:19 -08:00
ops-inl.h	Implements FusedSoftmaxAndSampleTopK.	2025-02-16 21:30:06 -08:00
ops.h	Infra improvements (2)	2025-01-23 01:55:19 -08:00
ops_test.cc	Apply PositionalEncodingQK always in-place.	2025-01-23 07:09:30 -08:00
sum-inl.h	Minor cleanup, Windows+Bazel build fixes	2024-10-10 09:05:06 -07:00