nullname
c23ab465c0
feat: perf opt part4 ( #43 )
...
* wip
* refactor: rewrite dequantize_row_q4_0 by intrinsic
* log for debug
* fix q4 intrinsic
* small opt
* wip
* wip
* add vtcm_quota_size
* add perf log for hexagon-npu backend
* wip
* add log
* sync after a specfic op
* increase worker thread priority
* fix unbalanced thread slice
* small slict to fit in vtcm cache
* limit the supported row element size
* opt 4_0 dequant
* fix q4 dequant
* add power_utils
* add rms_norm
* wip
* enable rms_norm f32
* fix rms_norm with param
* fix compiling flags
* use float
* fix small row size
* vectorized rms norm
* wip
* read 2 vectors
* rename
* add perf log on update
* set empty tensors handle also
* merge some rpc functions
* opt param update
* wip
* print more log
* add struct for update param config
* add npu_device_graph_set_tensor_with_param
* merge tensor and params update
* wip
* wip
* make as template to reuse
* vectorize dequantize_row_q8_0
* opt
* avoid using union to store q data
* wip
* wip
* wip
2025-05-28 00:00:42 +08:00
nullname
295f7f5957
feat: perf opt part3 ( #42 )
...
* add f16 support to etl wise op
* wip
* Revert "wip"
This reverts commit efa88deb0e8265614fd91db3c3dba777c00e858b.
* qf32 for mul
* wip
* Revert "wip"
This reverts commit bb419f89ca4599470d61d636fe6fa1e033d62748.
* disable fp16 add/sub
* tempate trick
* wip
* add f16 mulmat
* add log
* fix view liked op
* add log
* fix f16 mulmat
* add quant type
* wip
* add l2fetch
* add vtcm_mem
* wip
* fix fetch
* use vtcm cache in mulmat
* revert vtcm cache
* cache plane
* small opt for plane cache
* cache plane for some element wise op
* wip
* enable fetch even on vtcm
* wip
* copy sysMonApp
* small opt
* init ltu
* add compute_params
* add op common header
* move vtcm_mem allocation to compute_param
* fallback to memcache when vtcm allocate failed
* pre-calculate quantize type
* wip
* try fix test failure
* try fix mulmat nan
* fix inf in mulmat
* remove debug logs
* wip
* small refactoring on the dequant row func
* fix typo
* improve logging
* add q4_0 and q8_0
* wip
* wip
* build hexagon libs in cmake
* wip
* fix qnn only build flag
* fix typo
* fix todo
* wip
* wip
* add to_float
* use to)float directly instead of ltu
* wip
* cache f16_to_f32 table into vtcm
* print tensor dims at log
* init device in supports_op_impl
* revert cache ltu
* wip
* wip
* fix graph calc issues by validate cache manually after each op
* add cache invalidate func
* enable cache fallback only in quantize tensors
* add option to disable quantized tensors
* propagate the asan flag to npu build
* fix asan option
* wip
* invalidate tensors after finished
* implement backend_buffer_reset
* wip
* wip
* refactoring plane cache mechanism
* wip
* split row elements across thread
* use table for f16 to f32 conversion
* sync after each op
* small refactoring to invalidate l2 cahce
* wip
* opt on float fetching
* unroll for loop manually
* reduce vtcm usage
* add perf tracking for npu
* print dimensions for profiler log
* wip
* wip
* wip
* add sub proc tracker
* fix typo
* print pcycles
* wip
* wip
* prefetch rows
* add l2fetch_row
* small tweak based on perf tracer
* opt l2 fetching
* wip
2025-05-16 19:57:33 +08:00
nullname
c2b6fec63f
feat: perf opt part2 ( #39 )
...
* add qurt_thread
* add thread pool
* add thread_pool obj at device ctx
* wip
* small refactoring to fit the thread pool structure
* set start/end threads for add
* init thread pool
* fix thread creation
* split complete and pending signals
* opt mulmat
* wip
* 2 threads
* back to 4 threads
* use barrier
* remove some unnecessary package
* add multi thread support for mul mat
* wip
* use qurt_barrier_t instead of qurt_signal_t
* wip
* wip
* add log
* split qnn cmake config
* create function to calculate the start and end func
* wip
* fix comment
* fix comment
* fix comment
* wip
* fix typo
2025-04-27 17:43:32 +08:00
nullname
beff5c4b78
feat: op perf opt ( #38 )
...
* add op define xml
* copy qnn libs in cmake
* fix htp skel path
* add windows copy file list
* wip
* add generated package
* remove unused params
* add cmake list
* set qnn sdk and hexagon sdk path
* wip
* wip
* fix tools version
* fix compiling error
* fix dims calc
* wip
* add mulmat 2d
* wip
* reduction
* wip
* wip
* fix compiling error in x64
* wip
* fix device description in emulator
* wip
* add flag
* copy necessary libs
* wip
* load HtpPrepare first for emulator
* enable custom op for 2d matrix
* verify op config before add to node
* Revert "verify op config before add to node"
This reverts commit 206dec826e560625e053c4c78e023994f993526e.
* wip
* wip
* wip
* revert tool version change
* use hexagon sdk version 5.5.0
https://docs.qualcomm.com/bundle/publicresource/topics/80-77512-2/release-notes-wrapper.html?product=1601111740010422#5.5.0
* wip
* move to sub dir
* add hexagon npu device and server lib
* fix npu lib build
* refactoring: rename QNNBackend enum
* fix compiling error
* wip
* remove qnn/backend.hpp
* add hexagon dsp host layer
* extract rpc_mem from qnn submodule
* fix dsp compiling error
* wip
* wip
* open and lose npu device
* split objects into separated files
* fix linking error
* add npu_tensor
* add host graph
* map rpc buffer before usage
* fix some todos
* add shared module
* split rpc_interface from rpc_mem
* get get_dsp_arch from device
* wip
* rename host classes
* fix hexagon sdk arch getter
* fix device open
* fix linking error
* fix crash
* use tensor_data_type
* fix npu lib crash
* fix debug log print
* skip empty graph
* wip
* add log
* fix unmap fail
* fix tensor set
* remove some logs
* flush back memory after finished
* fix nb
* wip
* wip
* add helper function
* impl add op
* fix some add in test-backend-ops
* add elt wise sub and mul
* fix crash on some inplace op
* wip
* fix elt wise op calc
* wip
* split mul_mat into file
* add caps array
* wip
* wip
* print support/unsupport op
* copy lldb-server for newer android sdk
* add tensor_spec
* add assert
* fix crash when loading model
* rename cmake option
* fix name
* fix device memory and description
* fix compiling error on qnn only build
* fix some potential UBs
* fix comments
2025-04-21 12:06:16 +08:00