Nikhil Dev Goyal
|
259b757aef
|
Use Lookup8 and detail::IsFull(d) in FastSigmoid
Fix targeted for scalable architectures
PiperOrigin-RevId: 888633434
|
2026-03-24 06:36:55 -07:00 |
Nikhil Dev Goyal
|
90f3de7f15
|
Use paralell blend chain path in FastSigmoid on architectures having >=32 registers
PiperOrigin-RevId: 886178215
|
2026-03-19 07:54:05 -07:00 |
Nikhil Dev Goyal
|
50144738f1
|
Change calculation from (ax+b)/(cx+d) to (x + b')/(c'x+ d') this replaces a MulAdd with Add reducing port contention on modern cpus and thus increasing throughput.
Also reduces the need for 1 register to hold b as 1.0 here
PiperOrigin-RevId: 886170146
|
2026-03-19 07:36:52 -07:00 |
Nikhil Dev Goyal
|
5081341200
|
Use CappedTag to prevent potential out of bound reads.
PiperOrigin-RevId: 879141747
|
2026-03-05 10:40:52 -08:00 |
Nikhil Dev Goyal
|
6721dddf38
|
Implement FastSigmoid.
PiperOrigin-RevId: 878453196
|
2026-03-04 06:12:33 -08:00 |
Nikhil Dev Goyal
|
dd268ddbe8
|
Add FastGelu activation function in a newly created created fast_ops-inl.h files.
This replaces the Tanh call with FastTanh call in the Gelu function written in math-inl.h.
PiperOrigin-RevId: 876339830
|
2026-02-27 11:14:47 -08:00 |