* optimize flash attention kernel by improving score computation and online softmax update * wip * Refactor online softmax update in flash attention kernel for improved performance * Optimize flash attention kernel by replacing float array with HVX_Vector for score computation * wip |
||
|---|---|---|
| .. | ||
| htp | ||
| CMakeLists.txt | ||
| ggml-hexagon.cpp | ||
| htp-utils.c | ||
| htp-utils.h | ||
| op-desc.h | ||