llama.cpp/ggml/src/ggml-qnn/npu/device
nullname c23ab465c0
feat: perf opt part4 (#43)
* wip

* refactor: rewrite dequantize_row_q4_0 by intrinsic

* log for debug

* fix q4 intrinsic

* small opt

* wip

* wip

* add vtcm_quota_size

* add perf log for hexagon-npu backend

* wip

* add log

* sync after a specfic op

* increase worker thread priority

* fix unbalanced thread slice

* small slict to fit in vtcm cache

* limit the supported row element size

* opt 4_0 dequant

* fix q4 dequant

* add power_utils

* add rms_norm

* wip

* enable rms_norm f32

* fix rms_norm with param

* fix compiling flags

* use float

* fix small row size

* vectorized rms norm

* wip

* read 2 vectors

* rename

* add perf log on update

* set empty tensors handle also

* merge some rpc functions

* opt param update

* wip

* print more log

* add struct for update param config

* add npu_device_graph_set_tensor_with_param

* merge tensor and params update

* wip

* wip

* make as template to reuse

* vectorize dequantize_row_q8_0

* opt

* avoid using union to store q data

* wip

* wip

* wip
2025-05-28 00:00:42 +08:00
..
device.cpp feat: perf opt part4 (#43) 2025-05-28 00:00:42 +08:00
graph.cpp feat: perf opt part4 (#43) 2025-05-28 00:00:42 +08:00
graph.hpp feat: perf opt part4 (#43) 2025-05-28 00:00:42 +08:00
op_impl.cpp feat: perf opt part4 (#43) 2025-05-28 00:00:42 +08:00
op_impl.hpp feat: perf opt part4 (#43) 2025-05-28 00:00:42 +08:00
op_mul_mat.cpp feat: perf opt part4 (#43) 2025-05-28 00:00:42 +08:00
op_mul_mat.hpp feat: perf opt part4 (#43) 2025-05-28 00:00:42 +08:00
op_types.hpp feat: perf opt part4 (#43) 2025-05-28 00:00:42 +08:00
quants.cpp feat: perf opt part4 (#43) 2025-05-28 00:00:42 +08:00
quants.hpp feat: perf opt part4 (#43) 2025-05-28 00:00:42 +08:00
tensor.hpp feat: perf opt part4 (#43) 2025-05-28 00:00:42 +08:00
thread_pool.hpp feat: perf opt part4 (#43) 2025-05-28 00:00:42 +08:00
util.hpp feat: perf opt part4 (#43) 2025-05-28 00:00:42 +08:00
vtcm_mem.hpp feat: perf opt part4 (#43) 2025-05-28 00:00:42 +08:00