* move op key generate function to kOpCaps
* fix op desc print
* try fix rms_norm
* Revert "try fix rms_norm"
This reverts commit 33b296098012909cb482fc29b52b28098dc971cd.
* add quantization type support by converting them to float
* enable quantization tensor for mulmat in gpu/npu
* fix asan error
* add log and assert
* insert output convert operator after mulmat
* add log
* fix some error in running
* disable permute again
* add log
* add error function
* Revert "add error function"
This reverts commit f92ff47798ac8053fb776c55efbb1a98469c7af1.
* add log
* more log
* disable convert op in graph
* wip
* add f16 config for graph
* set f16 precision for f16 graph
* fix override data type
* add comment
* add config flag to enable quantize type
* add log
* more quantized type for cpu and gpu backend
* enable all quant types for cpu and gpu backend
* rename
* wip
* add log
* remove unused functions
* skip permute
* remove get_qnn_op_input_param_count
* fallback to generic_get_op_desc if no op_desc
* revert 'skip permute'
* Revert "revert 'skip permute'"
This reverts commit 5761e31fd23c69c4cabf6fd9fac1a0d3e5a74968.
* wip
* add log
* print qnn tensor type
* add log
* limit the max size of tensor
* add log
* fix tensor size limiter
* small improve on tensor info printer
* disable sqrt and div to pass test-backend-ops for 8 gen 2
* remove debug log in release build
* add log
* skip permute in src
* wip
* disable reshape
* skip mul at decoder start
* wip
* add log
* add qnn_scoped_timer
* add perf tracker in graph
* add cmake options GGML_QNN_ENABLE_PERFORMANCE_TRACKING
* fix flag name
* use milli-second
* wip
* fix comment string
* add file for profiler
* change qnn-cpu to GGML_BACKEND_DEVICE_TYPE_ACCEL, so that we can run tests on cpu
* wip
* profiler: refactoring
* wip
* add implement for print_profile_events
* set-up profiler for graph
* set profiler to graph execute
* pretty print events
* unified log print prefix
* print event count
* enable optrace
* print duration at event end
* wip
* add more detailed soc information
* wip
* move device caps array into qnn-lib.cpp
* remove lib_name in device_context
* move get_graph_key_from_cgraph to graph.cpp
* add override type for tensor key
* use override_type instead of original data type for graph key
* append op type to tensor name to fix error in qwen
* remove todo
* wip
* fix warning
* wip
* add todo for graph key generate
* rename some file to meet upstream guideline
* remove local .clang-format
* expend supported/unsupported counter to all ops
* append device name to log
* port to ggml logger
* fix warning after adapt to ggml logger
* append \n to all log
* use case op instead of convert
* Revert "use case op instead of convert"
This reverts commit e662fc2dfee41719aaf7bc9d75e03e8d0f7ded0f.
* fix op that needs same shape
* opt kQnnOpsTable
* refresh params name field when getting op config
* opt npu log print
* remove unused functions
* move qnn_instance function implementation into cpp
* wip
* wip
* move dl related function into separated file
* use cast op for gpu
* Revert "use cast op for gpu"
This reverts commit 05df7362a15c022d05940d682e84cf480a082c6a.
* Reapply "use cast op for gpu"
This reverts commit 2520e5922a216faceb6d7efcde23dafe6947a4b3.
* fix compiling error in win
* fix align_alloc in win
* fix compiling error
* add get sys free/total mem for win
* wip
* suppress warning in win
* add missing chrono header
* set the correct qnn lib name for windows
* add flag to control cpu backend
* wip
* wip
* Revert "Reapply "use cast op for gpu""
This reverts commit f56519c374a7d46faac706cf214de48ff5fc5139.
* fix compiling error for linux build
* fix cdsprpc dynamic library name
* wip
* skip rpc load fail
* fix page_align_alloc
* suppress some warning in gcc
* wip
* reuse align to function
* more log
* add log and fix warning
* wip
* fix asan errors and memory leaks
* fix the get_io_tensors_from_graph
* improve comment
* print GGML_QNN_DEFAULT_LIB_SEARCH_PATH
* revert some unused changes
* move library search path setter into qnn module
* fix android library loading
* skip qnn_device_get_platform_info for npu emulator
* more log
* split graph implementation into cpp file
* rename: ggml_qnn_graph -> qnn_graph
* add imput/output tensor to graph
* fix assert
* wip
* add _ggml_tensor field in qnn tensor
* add comments
* add set_data_buffer with raw memory buffer
* use set_data_buffer
* op param buffer use qnn_buffer_ptr
* add qnn_mem_buffer_slice
* use qnn_buffer_ptr as tensor buffer
* use new set_data_buffer to reduce copy
* ggml_qnn_op_config: add function to set input/output tensor before init node
* remove ggml_qnn_connectable_op_config and use ggml_qnn_single_op_config instead
* wip
* add initialize_op_nodes without tensor params
* wip
* add op caps table
* merge kGgmlOpToQnnOp and kOpCaps tables
* wip
* add cache parameter to create_tensors
* add init_from_ggml_graph
* disable gelu for all backend
* wip
* move op index calc to op config module
* use the ggml_tensor as parameter of build_graph
* add log
* use create_operation_from_op_tensor in old build_graph function
* remove unused constructors
* fix parameter count
* remove unused member func/var
* make init_from_ggml_graph as a class member: build_graph_from_ggml_graph
* move graph finalize into member function `finalize()`
* get graph key from ggml op tensor directly
* append output type
* reduce tensor key length
* add function to generate key from ggml_cgraph
* simplify graph cache insert and delete
* remove template param at get_qnn_graph_from_cache
* wip
* merge kQnnUnaryOpsTable and kQnnBinaryOpsTable
* refactor device_supports_op
* add log
* wip
* use framework function to check same shape
* wip
* extract some logic into separated function
* wip
* add execution function that runs graph
* add function to create qnn graph from ggml_cgraph with cache
* execute graph directly
* return null graph key for empty graph
* add more qualcomm chipset enums
* add cap for reshape
* disable some ops
* try to skip GGML_OP_VIEW
* moew log for view tensor
* append param tensor into intermedia tensor key
* use 'ordered' set
* fix warning in release
* wip
* ggml_qnn_op_config now manager the construction of ggml_qnn_tensor
* wip
* add interface ggml_qnn_op_config
* add ggml_qnn_list_op_config
* add create_tensor and move tensor bind to execute
* wip
* rename: ggml_qnn_list_op_config -> ggml_qnn_matmul_op_config
* add tensortype to allow native tensor
* remove ggml_tensor param at ggml_qnn_tensor::create_tensor
* postpone the tensor id allocation to add_node
* add ggml_qnn_op_config_base
* trival change to reduct the param of function
* split bind_tensors into bind_input_tensors and bind_output_tensors
* implement ggml_qnn_single_op_config::create_tensors
next will set the prameter of transpose
* tensor: add bind buffer
* add parameter tensor type
* implement add_tensor_param
* set qnn_instance only at constructor
* set transpose tensor param
* move create_op_constructor into op-config module
* create QNN_OP_MAT_MUL from ggml_qnn_matmul_op_config
* try fix crash
* fix compiling error at older ndk (r23c)
* fix crash
* fix parameter tensor name
* update tensor dimension assignment and add TODO
* fix mat_mul graph creating
* fix MUL_MAT_256x16x10x1_256x1x10x1_16x1x10x1
* append type to graph cache key
* wip
* fix supported op
* update comment
* disable op other than add and mat_mul
* add convert op to adapt multi input/output format
* disable f16 for cpu backend according to official doc
https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/cpu_backend.html#supported-operations
* add supported data types flags in each backend
* remove unused functions
* append output type to graph key
* fix gpu backend by disable the different data type op
* fix cpu backend support ops
* fix duplicated tensor name
* append op name
* suppress warning
* remove unused code