llama.cpp/ggml
nullname c23ab465c0
feat: perf opt part4 (#43)
* wip

* refactor: rewrite dequantize_row_q4_0 by intrinsic

* log for debug

* fix q4 intrinsic

* small opt

* wip

* wip

* add vtcm_quota_size

* add perf log for hexagon-npu backend

* wip

* add log

* sync after a specfic op

* increase worker thread priority

* fix unbalanced thread slice

* small slict to fit in vtcm cache

* limit the supported row element size

* opt 4_0 dequant

* fix q4 dequant

* add power_utils

* add rms_norm

* wip

* enable rms_norm f32

* fix rms_norm with param

* fix compiling flags

* use float

* fix small row size

* vectorized rms norm

* wip

* read 2 vectors

* rename

* add perf log on update

* set empty tensors handle also

* merge some rpc functions

* opt param update

* wip

* print more log

* add struct for update param config

* add npu_device_graph_set_tensor_with_param

* merge tensor and params update

* wip

* wip

* make as template to reuse

* vectorize dequantize_row_q8_0

* opt

* avoid using union to store q data

* wip

* wip

* wip
2025-05-28 00:00:42 +08:00
..
cmake scripts : update sync + fix cmake merge 2025-03-27 10:09:29 +02:00
include Merge branch 'master' into dev-refactoring 2025-05-27 10:21:33 +08:00
src feat: perf opt part4 (#43) 2025-05-28 00:00:42 +08:00
.gitignore vulkan : cmake integration (#8119) 2024-07-13 18:12:39 +02:00
CMakeLists.txt Merge branch 'master' into dev-refactoring 2025-05-27 10:21:33 +08:00