This implements a variation of the perf logger where rather than timing each operation individually with effectively a barrier in between, we put the timing boundaries where we already synchronize and time the groups of work that normally overlap. This can be useful to help understand whether individual operations need to be optimized, or if the group is already running efficiently. GGML_VK_PERF_LOGGER_CONCURRENT=1 enables the new mode (when GGML_VK_PERF_LOGGER is also set). GGML_VK_SYNC_LOGGER=1 replaces the ENABLE_SYNC_LOGGING compile time switch. |
||
|---|---|---|
| .. | ||
| cmake | ||
| include | ||
| src | ||
| .gitignore | ||
| CMakeLists.txt | ||