llama.cpp/ggml
nullname c2b6fec63f
feat: perf opt part2 (#39)
* add qurt_thread

* add thread pool

* add thread_pool obj at device ctx

* wip

* small refactoring to fit the thread pool structure

* set start/end threads for add

* init thread pool

* fix thread creation

* split complete and pending signals

* opt mulmat

* wip

* 2 threads

* back to 4 threads

* use barrier

* remove some unnecessary package

* add multi thread support for mul mat

* wip

* use qurt_barrier_t instead of qurt_signal_t

* wip

* wip

* add log

* split qnn cmake config

* create function to calculate the start and end func

* wip

* fix comment

* fix comment

* fix comment

* wip

* fix typo
2025-04-27 17:43:32 +08:00
..
cmake scripts : update sync + fix cmake merge 2025-03-27 10:09:29 +02:00
include Merge branch 'master' into dev-refactoring 2025-04-24 21:33:23 +08:00
src feat: perf opt part2 (#39) 2025-04-27 17:43:32 +08:00
.gitignore vulkan : cmake integration (#8119) 2024-07-13 18:12:39 +02:00
CMakeLists.txt Merge branch 'master' into dev-refactoring 2025-04-24 21:33:23 +08:00