- Compute row size for the temp buffer based on the output of the first pass. - Update shader addressing math to use the output row size - Pass the output row size as "ncols_output", what used to be "ncols_output" is now "k" For the common case of K=40 and src0=(200000,1,1,1), this reduces the temporary buffer from about 3.2MB to 500KB. |
||
|---|---|---|
| .. | ||
| cmake | ||
| include | ||
| src | ||
| .gitignore | ||
| CMakeLists.txt | ||