Commit Graph

109 Commits

Author SHA1 Message Date
hongruichen 332514cd5c qnn fix: update device capabilities for quantized types in qnn-lib to improve compatibility 2025-06-23 16:04:01 +08:00
nullname af620a12f7
feat: flash attention support for hexagon-npu (#45)
* add flash attn op

* expend src tensor size

* add flash attn sources

* add quantize row functions

* make a separated file for vec_dot

* wip

* wip

* refactor: rename quants.hpp includes and add vec_dot to type traits

* add flash_attn impl

* split vec_scale_f32

* move vec_reduction_qf32 to vec_ops

* add vec_scale_f16

* opt

* add vec_mad

* implement vec_mad_f16

* opt

* add op template

* opt

* add align version

* enable flash attn

* wip

* log print improve

* add profiler log

* wip

* wip

* add multi sub proc perf tracker

* increase log buffer

* remove sub prov pcycle

* wip

* wip

* add prefetch for vec_dot

* wip

* wip

* opt f16 vec dot

* opt f16 vecdot

* reuse vec_dot_product_impl in vec dot f32

* small opt to unblock pipeline

* opt on aligned address

wip

* Revert "opt on aligned address"

This reverts commit 27be1eb61a7d29d2f5fa6f90383e1b5d7fdf9b6a.

* add profiler log at thread_pool

* wip

* invalidate all...

* Reapply "opt on aligned address"

This reverts commit f075a4c4586e32b7e5819c1fe7f9b6ed218b1767.

* add is_constant for tensor config

* disable align tensor opt in mul_mat

* wip

* wip

* vec_scale_impl: unrolling the loop

* wip

* wip

* replace reinterpret_cast with direct pointer access for write/read buffers

* add fetch

* wip

* wip

* wip

* add log

* check tensor shape at flash_attn

* wip

* wip

* fix: update tensor type handling in flash_attn_impl

* wip

* fix: align cache size

* fix: qf16->hf

* fix: swap order of elements in vector combine for correct scaling

* fix: opt f16 scale and mad

* fix leftover fetch

* wip

* load into vector pair

* opt cache size calculation in flash_attn_impl

* refactoring: hold vtcm at thread local object

* wip

* add profiler log

* mark tensors as modified

* restrict tensor invalidation to the first thread in compute_impl

* Revert "restrict tensor invalidation to the first thread in compute_impl"

This reverts commit 0a8ff2b1bcf366097c16d7437c091382eacbef8b.

* invalidate last tensor in compute_impl

* invalidate last tensor in compute function

* wip

* refactor dequantize_row_q4_0 to simplify vector alignment

* wip

* refactoring: move VTCM quota calculation to thread pool

* wip

* fix: correct condition check for HEXAGON_SDK_ROOT existence

* wip

* wip

* wip

* wip

* fix: update condition checks match the naming

* fix: improve tensor handling checks and logging in graph and operation implementations

* wip
2025-06-18 10:32:08 +08:00
nullname c23ab465c0
feat: perf opt part4 (#43)
* wip

* refactor: rewrite dequantize_row_q4_0 by intrinsic

* log for debug

* fix q4 intrinsic

* small opt

* wip

* wip

* add vtcm_quota_size

* add perf log for hexagon-npu backend

* wip

* add log

* sync after a specfic op

* increase worker thread priority

* fix unbalanced thread slice

* small slict to fit in vtcm cache

* limit the supported row element size

* opt 4_0 dequant

* fix q4 dequant

* add power_utils

* add rms_norm

* wip

* enable rms_norm f32

* fix rms_norm with param

* fix compiling flags

* use float

* fix small row size

* vectorized rms norm

* wip

* read 2 vectors

* rename

* add perf log on update

* set empty tensors handle also

* merge some rpc functions

* opt param update

* wip

* print more log

* add struct for update param config

* add npu_device_graph_set_tensor_with_param

* merge tensor and params update

* wip

* wip

* make as template to reuse

* vectorize dequantize_row_q8_0

* opt

* avoid using union to store q data

* wip

* wip

* wip
2025-05-28 00:00:42 +08:00
nullname 2306f82a58 fix compiling error 2025-05-27 06:35:41 +00:00
nullname 295f7f5957
feat: perf opt part3 (#42)
* add f16 support to etl wise op

* wip

* Revert "wip"

This reverts commit efa88deb0e8265614fd91db3c3dba777c00e858b.

* qf32 for mul

* wip

* Revert "wip"

This reverts commit bb419f89ca4599470d61d636fe6fa1e033d62748.

* disable fp16 add/sub

* tempate trick

* wip

* add f16 mulmat

* add log

* fix view liked op

* add log

* fix f16 mulmat

* add quant type

* wip

* add l2fetch

* add vtcm_mem

* wip

* fix fetch

* use vtcm cache in mulmat

* revert vtcm cache

* cache plane

* small opt for plane cache

* cache plane for some element wise op

* wip

* enable fetch even on vtcm

* wip

* copy sysMonApp

* small opt

* init ltu

* add compute_params

* add op common header

* move vtcm_mem allocation to compute_param

* fallback to memcache when vtcm allocate failed

* pre-calculate quantize type

* wip

* try fix test failure

* try fix mulmat nan

* fix inf in mulmat

* remove debug logs

* wip

* small refactoring on the dequant row func

* fix typo

* improve logging

* add q4_0 and q8_0

* wip

* wip

* build hexagon libs in cmake

* wip

* fix qnn only build flag

* fix typo

* fix todo

* wip

* wip

* add to_float

* use to)float directly instead of ltu

* wip

* cache f16_to_f32 table into vtcm

* print tensor dims at log

* init device in supports_op_impl

* revert cache ltu

* wip

* wip

* fix graph calc issues by validate cache manually after each op

* add cache invalidate func

* enable cache fallback only in quantize tensors

* add option to disable quantized tensors

* propagate the asan flag to npu build

* fix asan option

* wip

* invalidate tensors after finished

* implement backend_buffer_reset

* wip

* wip

* refactoring plane cache mechanism

* wip

* split row elements across thread

* use table for f16 to f32 conversion

* sync after each op

* small refactoring to invalidate l2 cahce

* wip

* opt on float fetching

* unroll for loop manually

* reduce vtcm usage

* add perf tracking for npu

* print dimensions for profiler log

* wip

* wip

* wip

* add sub proc tracker

* fix typo

* print pcycles

* wip

* wip

* prefetch rows

* add l2fetch_row

* small tweak based on perf tracer

* opt l2 fetching

* wip
2025-05-16 19:57:33 +08:00
hongruichen db2a125438 fix GGML_QNN_ENABLE_PERFORMANCE_TRACKING option 2025-05-13 20:18:09 +08:00
hongruichen 02af8ff653 fix qnn only build flag 2025-05-08 21:28:11 +08:00
hongruichen 0ce53ce7cd fix linking error 2025-05-08 12:19:40 +08:00
hongruichen 039f835410 fix compiling error 2025-05-08 10:17:48 +08:00
hongruichen 161c4ee124 fix typo 2025-05-08 01:20:41 +08:00
nullname c2b6fec63f
feat: perf opt part2 (#39)
* add qurt_thread

* add thread pool

* add thread_pool obj at device ctx

* wip

* small refactoring to fit the thread pool structure

* set start/end threads for add

* init thread pool

* fix thread creation

* split complete and pending signals

* opt mulmat

* wip

* 2 threads

* back to 4 threads

* use barrier

* remove some unnecessary package

* add multi thread support for mul mat

* wip

* use qurt_barrier_t instead of qurt_signal_t

* wip

* wip

* add log

* split qnn cmake config

* create function to calculate the start and end func

* wip

* fix comment

* fix comment

* fix comment

* wip

* fix typo
2025-04-27 17:43:32 +08:00
nullname beff5c4b78
feat: op perf opt (#38)
* add op define xml

* copy qnn libs in cmake

* fix htp skel path

* add windows copy file list

* wip

* add generated package

* remove unused params

* add cmake list

* set qnn sdk and hexagon sdk path

* wip

* wip

* fix tools version

* fix compiling error

* fix dims calc

* wip

* add mulmat 2d

* wip

* reduction

* wip

* wip

* fix compiling error in x64

* wip

* fix device description in emulator

* wip

* add flag

* copy necessary libs

* wip

* load HtpPrepare first for emulator

* enable custom op for 2d matrix

* verify op config before add to node

* Revert "verify op config before add to node"

This reverts commit 206dec826e560625e053c4c78e023994f993526e.

* wip

* wip

* wip

* revert tool version change

* use hexagon sdk version 5.5.0

https://docs.qualcomm.com/bundle/publicresource/topics/80-77512-2/release-notes-wrapper.html?product=1601111740010422#5.5.0

* wip

* move to sub dir

* add hexagon npu device and server lib

* fix npu lib build

* refactoring: rename QNNBackend enum

* fix compiling error

* wip

* remove qnn/backend.hpp

* add hexagon dsp host layer

* extract rpc_mem from qnn submodule

* fix dsp compiling error

* wip

* wip

* open and lose npu device

* split objects into separated files

* fix linking error

* add npu_tensor

* add host graph

* map rpc buffer before usage

* fix some todos

* add shared module

* split rpc_interface from rpc_mem

* get get_dsp_arch from device

* wip

* rename host classes

* fix hexagon sdk arch getter

* fix device open

* fix linking error

* fix crash

* use tensor_data_type

* fix npu lib crash

* fix debug log print

* skip empty graph

* wip

* add log

* fix unmap fail

* fix tensor set

* remove some logs

* flush back memory after finished

* fix nb

* wip

* wip

* add helper function

* impl add op

* fix some add in test-backend-ops

* add elt wise sub and mul

* fix crash on some inplace op

* wip

* fix elt wise op calc

* wip

* split mul_mat into file

* add caps array

* wip

* wip

* print support/unsupport op

* copy lldb-server for newer android sdk

* add tensor_spec

* add assert

* fix crash when loading model

* rename cmake option

* fix name

* fix device memory and description

* fix compiling error on qnn only build

* fix some potential UBs

* fix comments
2025-04-21 12:06:16 +08:00
hongruichen 9e41f79403 fix compiling error after merge master 2025-04-16 11:16:26 +08:00
hongruichen 1caca627ea fix compiling error after merge 2025-03-22 12:51:09 +08:00
nullname a1ab67478f
[feat] add more op (#35)
* move op key generate function to kOpCaps

* fix op desc print

* try fix rms_norm

* Revert "try fix rms_norm"

This reverts commit 33b296098012909cb482fc29b52b28098dc971cd.

* add quantization type support by converting them to float

* enable quantization tensor for mulmat in gpu/npu

* fix asan error

* add log and assert

* insert output convert operator after mulmat

* add log

* fix some error in running

* disable permute again

* add log

* add error function

* Revert "add error function"

This reverts commit f92ff47798ac8053fb776c55efbb1a98469c7af1.

* add log

* more log

* disable convert op in graph

* wip

* add f16 config for graph

* set f16 precision for f16 graph

* fix override data type

* add comment

* add config flag to enable quantize type

* add log

* more quantized type for cpu and gpu backend

* enable all quant types for cpu and gpu backend

* rename

* wip

* add log

* remove unused functions

* skip permute

* remove get_qnn_op_input_param_count

* fallback to generic_get_op_desc if no op_desc

* revert 'skip permute'

* Revert "revert 'skip permute'"

This reverts commit 5761e31fd23c69c4cabf6fd9fac1a0d3e5a74968.

* wip

* add log

* print qnn tensor type

* add log

* limit the max size of tensor

* add log

* fix tensor size limiter

* small improve on tensor info printer

* disable sqrt and div to pass test-backend-ops for 8 gen 2

* remove debug log in release build

* add log

* skip permute in src

* wip

* disable reshape

* skip mul at decoder start

* wip

* add log

* add qnn_scoped_timer

* add perf tracker in graph

* add cmake options GGML_QNN_ENABLE_PERFORMANCE_TRACKING

* fix flag name

* use milli-second

* wip

* fix comment string

* add file for profiler

* change qnn-cpu to GGML_BACKEND_DEVICE_TYPE_ACCEL, so that we can run tests on cpu

* wip

* profiler: refactoring

* wip

* add implement for print_profile_events

* set-up profiler for graph

* set profiler to graph execute

* pretty print events

* unified log print prefix

* print event count

* enable optrace

* print duration at event end

* wip

* add more detailed soc information

* wip

* move device caps array into qnn-lib.cpp

* remove lib_name in device_context

* move get_graph_key_from_cgraph to graph.cpp

* add override type for tensor key

* use override_type instead of original data type for graph key

* append op type to tensor name to fix error in qwen

* remove todo

* wip
2025-03-22 12:34:31 +08:00
hongruichen 31847c8301 fix compiling error after merge 2025-03-05 22:25:36 +08:00
nullname 8b652dd6ec
bug: fix benchmark debug warning (#31)
* print build type

* wip

* print compiling flags

* wip

* wip
2025-02-28 22:54:57 +08:00
nullname f289752664
[bugfix]make sure single node op will have the same type (#29)
* debug

* disable reshape

* make sure single node op have same type

* fix warning at the logger

* Revert "disable reshape"

This reverts commit 5aeca4ba9bec6db3f047f9da803df20f9f6612b3.
2025-02-28 19:18:16 +08:00
nullname c867641222
feat: fix some TODO item in upstream PR #26 (#27)
* fix warning

* wip

* add todo for graph key generate

* rename some file to meet upstream guideline

* remove local .clang-format

* expend supported/unsupported counter to all ops

* append device name to log

* port to ggml logger

* fix warning after adapt to ggml logger

* append \n to all log

* use case op instead of convert

* Revert "use case op instead of convert"

This reverts commit e662fc2dfee41719aaf7bc9d75e03e8d0f7ded0f.

* fix op that needs same shape

* opt kQnnOpsTable

* refresh params name field when getting op config

* opt npu log print

* remove unused functions
2025-02-27 23:16:08 +08:00
nullname ff033e1e23
opt mulmat base on official doc (#25)
https://github.com/ggml-org/llama.cpp/blob/master/CONTRIBUTING.md
2025-02-25 19:46:48 +08:00
nullname a822d00753
feat: run on win (#24)
* move qnn_instance function implementation into cpp

* wip

* wip

* move dl related function into separated file

* use cast op for gpu

* Revert "use cast op for gpu"

This reverts commit 05df7362a15c022d05940d682e84cf480a082c6a.

* Reapply "use cast op for gpu"

This reverts commit 2520e5922a216faceb6d7efcde23dafe6947a4b3.

* fix compiling error in win

* fix align_alloc in win

* fix compiling error

* add get sys free/total mem for win

* wip

* suppress warning in win

* add missing chrono header

* set the correct qnn lib name for windows

* add flag to control cpu backend

* wip

* wip

* Revert "Reapply "use cast op for gpu""

This reverts commit f56519c374a7d46faac706cf214de48ff5fc5139.

* fix compiling error for linux build

* fix cdsprpc dynamic library name

* wip

* skip rpc load fail

* fix page_align_alloc

* suppress some warning in gcc

* wip

* reuse align to function

* more log

* add log and fix warning

* wip

* fix asan errors and memory leaks

* fix the get_io_tensors_from_graph

* improve comment

* print GGML_QNN_DEFAULT_LIB_SEARCH_PATH

* revert some unused changes

* move library search path setter into qnn module

* fix android library loading

* skip qnn_device_get_platform_info for npu emulator
2025-02-24 10:47:47 +08:00
nullname 10bd671c08
[feat]add more op support (#18)
* disable rpc buffer for npu

* append input/output tensor size into unsupported op log

* log dimensions for unsupported tensor

* wip

* split op config classes into separated file

* fix reshape

* wip

* add op_constructor_with_type_param

* set parameter for op_constructor_with_type_param func
2025-01-18 22:15:27 +08:00
hongruichen 5f93376f67 fix compiling error after merged 2025-01-10 11:30:03 +08:00
nullname f2d8d017da
[feat] Port ggml graph to QNN graph (#16)
* more log

* split graph implementation into cpp file

* rename: ggml_qnn_graph -> qnn_graph

* add imput/output tensor to graph

* fix assert

* wip

* add _ggml_tensor field in qnn tensor

* add comments

* add set_data_buffer with raw memory buffer

* use set_data_buffer

* op param buffer use qnn_buffer_ptr

* add qnn_mem_buffer_slice

* use qnn_buffer_ptr as tensor buffer

* use new set_data_buffer to reduce copy

* ggml_qnn_op_config: add function to set input/output tensor before init node

* remove ggml_qnn_connectable_op_config and use ggml_qnn_single_op_config instead

* wip

* add initialize_op_nodes without tensor params

* wip

* add op caps table

* merge kGgmlOpToQnnOp and kOpCaps tables

* wip

* add cache parameter to create_tensors

* add init_from_ggml_graph

* disable gelu for all backend

* wip

* move op index calc to op config module

* use the ggml_tensor as parameter of build_graph

* add log

* use create_operation_from_op_tensor in old build_graph function

* remove unused constructors

* fix parameter count

* remove unused member func/var

* make init_from_ggml_graph as a class member: build_graph_from_ggml_graph

* move graph finalize into member function `finalize()`

* get graph key from ggml op tensor directly

* append output type

* reduce tensor key length

* add function to generate key from ggml_cgraph

* simplify graph cache insert and delete

* remove template param at get_qnn_graph_from_cache

* wip

* merge kQnnUnaryOpsTable and kQnnBinaryOpsTable

* refactor device_supports_op

* add log

* wip

* use framework function to check same shape

* wip

* extract some logic into separated function

* wip

* add execution function that runs graph

* add function to create qnn graph from ggml_cgraph with cache

* execute graph directly

* return null graph key for empty graph

* add more qualcomm chipset enums

* add cap for reshape

* disable some ops

* try to skip GGML_OP_VIEW

* moew log for view tensor

* append param tensor into intermedia tensor key

* use 'ordered' set

* fix warning in release

* wip
2025-01-10 11:13:25 +08:00
hongruichen 79f124a699 add missing op 2024-12-14 15:49:44 +08:00
nullname e36ad89528
bugfix: error pre-allocated tensor (k_cache_view-0) (#12)
* fix device binding at ggml_backend_qnn_buffer_type

* merge ggml_backend_qnn_buffer_context and qnn_mem_buffer

* wip

* add log

* wip

* add qnn_buffer_ptr

* remove tailing `\n` at log

* add log

* enable GGML_OP_NONE

* wip

* wip

* disable tensor with view

* wip

* wip

* more log for view tensor

* re-enable view

* wip

* remove link android lib

* set dimension at bind function

* move graph traversal to backend-ops

* wip

* add get_view_internal_dimension to obtain the tensor view source dimension

* use _view_source_dimensions to allocate qnn tensor

* add place holder function ggml_backend_qnn_cpy_tensor_async

* add ggml_qnn_aggregate_op_config

* make matmul based on ggml_qnn_aggregate_op_config

* wip

* manually specify the order of op destruct

* skip register qnn-cpu backend

* disable view op again

* remove _view_source_dimensions

* add nop for reshape and view ops

* add log

* add comment
2024-12-11 10:42:00 +08:00
hongruichen 0d02ee09ed fix int overflow and remove view op to pass unit test 2024-12-03 10:55:11 +08:00
hongruichen c5e6549331 fix: fix assertion 2024-11-29 23:38:06 +08:00
hongruichen 09efaa389e define compile flag as module private 2024-11-29 17:24:05 +08:00
hongruichen 6d4feae579 redo conflict changes 2024-11-29 17:14:01 +08:00
hongruichen 5103b166ba bugfix: block large tensor calc in npu 2024-11-29 14:19:34 +08:00
nullname a2df09b6af
[WIP] feat: perf opt (#10)
* reduce log

* wip

* add function to create concat nodes

* opt

* insert concat node before mulmat

* use resize op

* wip

* add bind_buffer and remov ggml prefix in tensor types

* use gather node instead

* fix tensor type, now succeed in gpu and cpu, failed in npu

* add comment

* wip

* add comment

* wip

* in destructor, clear internal buffer before unbind

* disable gather for npu

* wip

* count swap memory as free memory

* wip

* fix supported_types

ggml_backend_device_i.supports_op will be invoked before ggml_backend_device_i.init_backend

* rename create_tensors -> initialize_op_nodes

* move ggml_qnn_op_config to deparated file

* wip

* add create_convert_nodes

* add comment

* enable different type in/out for npu and cpu backend

* fix npu convert op

* enlarge max buffer size

* add more error code

* check tensor type before create convert node

* add log

* add log

* remove transpose0 and use buildin transpose flag

* rename transpose1 -> transpose_out

* disable convert for npu

* add more logs
2024-11-29 00:03:23 +08:00
nullname e6dbdacc32
feat: fix llama-bench (#7)
* remove unused functions

* wip

* init from last devices

* move init into constructor

* wip

* add static assert to device table

* make kDeviceCaps as constexpr

* get free memory and total memory

* add optimize flag for qnn backend
2024-11-13 17:06:46 +08:00
nullname 8ad86dc703
feat: add QNN_OP_TRANSPOSE (#6)
* redo: add convert nodes

This reverts commit 8448acd5ebf8fe86ab9d25313b64a15c811ef96e.

* align clang format with cann

* rename binary_op -> general_op

casue there're some op that will only tak 1 param

* Revert "rename binary_op -> general_op"

This reverts commit 5be63b1a0dc4614457785367dade62158fe46214.

* wip

* add GGML_OP_PERMUTE

* add GGML_OP_VIEW and GGML_OP_GET_ROWS

* wip

* Revert "wip"

This reverts commit 772462ca6cfa01ea31bde725c2da60076ad9385f.
2024-11-04 23:12:03 +08:00
nullname fe565cfd9f
fix compiling error in release 2024-10-29 15:47:07 +08:00
hongruichen 5c1e6d4905 disable gelu in NPU 2024-10-29 00:54:08 +08:00
nullname 4abaf7d87e
feat: fix mulmat (#2)
* ggml_qnn_op_config now manager the construction of ggml_qnn_tensor

* wip

* add interface ggml_qnn_op_config

* add ggml_qnn_list_op_config

* add create_tensor and move tensor bind to execute

* wip

* rename: ggml_qnn_list_op_config -> ggml_qnn_matmul_op_config

* add tensortype to allow native tensor

* remove ggml_tensor param at ggml_qnn_tensor::create_tensor

* postpone the tensor id allocation to add_node

* add ggml_qnn_op_config_base

* trival change to reduct the param of function

* split bind_tensors into bind_input_tensors and bind_output_tensors

* implement ggml_qnn_single_op_config::create_tensors

next will set the prameter of transpose

* tensor: add bind buffer

* add parameter tensor type

* implement add_tensor_param

* set qnn_instance only at constructor

* set transpose tensor param

* move create_op_constructor into op-config module

* create QNN_OP_MAT_MUL from ggml_qnn_matmul_op_config

* try fix crash

* fix compiling error at older ndk (r23c)

* fix crash

* fix parameter tensor name

* update tensor dimension assignment and add TODO

* fix mat_mul graph creating

* fix MUL_MAT_256x16x10x1_256x1x10x1_16x1x10x1

* append type to graph cache key

* wip

* fix supported op

* update comment

* disable op other than add and mat_mul

* add convert op to adapt multi input/output format

* disable f16 for cpu backend according to official doc

https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/cpu_backend.html#supported-operations

* add supported data types flags in each backend

* remove unused functions

* append output type to graph key

* fix gpu backend by disable the different data type op

* fix cpu backend support ops

* fix duplicated tensor name

* append op name

* suppress warning

* remove unused code
2024-10-28 12:48:16 +08:00
hongruichen 181cf52888 adapt new register backend interface and fix missing ops 2024-10-11 10:17:50 +08:00
hongruichen 1da8a3e678 fix compiling error after merge 2024-09-30 10:37:23 +08:00
Hongrui Chen a1ceaae4ad fix compiling error at older ndk (r23c) 2024-09-30 10:18:12 +08:00
hongruichen 481cb3a0c5 fix compiling error 2024-09-07 12:29:26 +08:00
みゃん dedadf2a20
Fixed a bug where debug code was included in the release, resulting i… (#1)
* Fixed a bug where debug code was included in the release, resulting in an undefined function error.

* Change the path of the QNN library when building in termux environment

* Revert "Change the path of the QNN library when building in termux environment"

This reverts commit c6e26a3679da2608940e2163e090adf75d667400.

* Changed so that GGML_QNN_DEFAULT_LIB_SEARCH_PATH can be set from command line arguments
2024-08-20 10:20:23 +08:00
hongruichen 47f6e02eda fix: try fix the tensor rank of mul mat 2024-07-31 23:54:07 +08:00
hongruichen 74eb05a13b feat: add ggml_qnn_op_config for handle different op 2024-07-31 20:22:37 +08:00
hongruichen 9a5f802bb6 refactoring: add convient macro to disable copy and move of class 2024-07-29 22:18:48 +08:00
hongruichen 6da82947df refactoring: set the default qnn lib search path at CMakeLists.txt by GGML_QNN_DEFAULT_LIB_SEARCH_PATH 2024-07-29 15:53:14 +08:00
hongruichen 1f9d2a7e22 refactoring: improve tensor print 2024-07-28 22:05:51 +08:00
hongruichen e33b5c9837 refactoring: print the name of unsupport op 2024-07-27 13:49:49 +08:00
hongruichen 8ab1f15fe3 refactoring: remove internal functions, use op table directly 2024-07-27 13:43:07 +08:00
hongruichen e0c9b34016 feat: check if dims equal for add
looks qnn add can only applied to matrix with equal dimensions
2024-07-27 13:38:12 +08:00