Commit Graph

5 Commits

Author SHA1 Message Date
nullname f2d8d017da
[feat] Port ggml graph to QNN graph (#16)
* more log

* split graph implementation into cpp file

* rename: ggml_qnn_graph -> qnn_graph

* add imput/output tensor to graph

* fix assert

* wip

* add _ggml_tensor field in qnn tensor

* add comments

* add set_data_buffer with raw memory buffer

* use set_data_buffer

* op param buffer use qnn_buffer_ptr

* add qnn_mem_buffer_slice

* use qnn_buffer_ptr as tensor buffer

* use new set_data_buffer to reduce copy

* ggml_qnn_op_config: add function to set input/output tensor before init node

* remove ggml_qnn_connectable_op_config and use ggml_qnn_single_op_config instead

* wip

* add initialize_op_nodes without tensor params

* wip

* add op caps table

* merge kGgmlOpToQnnOp and kOpCaps tables

* wip

* add cache parameter to create_tensors

* add init_from_ggml_graph

* disable gelu for all backend

* wip

* move op index calc to op config module

* use the ggml_tensor as parameter of build_graph

* add log

* use create_operation_from_op_tensor in old build_graph function

* remove unused constructors

* fix parameter count

* remove unused member func/var

* make init_from_ggml_graph as a class member: build_graph_from_ggml_graph

* move graph finalize into member function `finalize()`

* get graph key from ggml op tensor directly

* append output type

* reduce tensor key length

* add function to generate key from ggml_cgraph

* simplify graph cache insert and delete

* remove template param at get_qnn_graph_from_cache

* wip

* merge kQnnUnaryOpsTable and kQnnBinaryOpsTable

* refactor device_supports_op

* add log

* wip

* use framework function to check same shape

* wip

* extract some logic into separated function

* wip

* add execution function that runs graph

* add function to create qnn graph from ggml_cgraph with cache

* execute graph directly

* return null graph key for empty graph

* add more qualcomm chipset enums

* add cap for reshape

* disable some ops

* try to skip GGML_OP_VIEW

* moew log for view tensor

* append param tensor into intermedia tensor key

* use 'ordered' set

* fix warning in release

* wip
2025-01-10 11:13:25 +08:00
nullname e36ad89528
bugfix: error pre-allocated tensor (k_cache_view-0) (#12)
* fix device binding at ggml_backend_qnn_buffer_type

* merge ggml_backend_qnn_buffer_context and qnn_mem_buffer

* wip

* add log

* wip

* add qnn_buffer_ptr

* remove tailing `\n` at log

* add log

* enable GGML_OP_NONE

* wip

* wip

* disable tensor with view

* wip

* wip

* more log for view tensor

* re-enable view

* wip

* remove link android lib

* set dimension at bind function

* move graph traversal to backend-ops

* wip

* add get_view_internal_dimension to obtain the tensor view source dimension

* use _view_source_dimensions to allocate qnn tensor

* add place holder function ggml_backend_qnn_cpy_tensor_async

* add ggml_qnn_aggregate_op_config

* make matmul based on ggml_qnn_aggregate_op_config

* wip

* manually specify the order of op destruct

* skip register qnn-cpu backend

* disable view op again

* remove _view_source_dimensions

* add nop for reshape and view ops

* add log

* add comment
2024-12-11 10:42:00 +08:00
nullname a2df09b6af
[WIP] feat: perf opt (#10)
* reduce log

* wip

* add function to create concat nodes

* opt

* insert concat node before mulmat

* use resize op

* wip

* add bind_buffer and remov ggml prefix in tensor types

* use gather node instead

* fix tensor type, now succeed in gpu and cpu, failed in npu

* add comment

* wip

* add comment

* wip

* in destructor, clear internal buffer before unbind

* disable gather for npu

* wip

* count swap memory as free memory

* wip

* fix supported_types

ggml_backend_device_i.supports_op will be invoked before ggml_backend_device_i.init_backend

* rename create_tensors -> initialize_op_nodes

* move ggml_qnn_op_config to deparated file

* wip

* add create_convert_nodes

* add comment

* enable different type in/out for npu and cpu backend

* fix npu convert op

* enlarge max buffer size

* add more error code

* check tensor type before create convert node

* add log

* add log

* remove transpose0 and use buildin transpose flag

* rename transpose1 -> transpose_out

* disable convert for npu

* add more logs
2024-11-29 00:03:23 +08:00
nullname 8ad86dc703
feat: add QNN_OP_TRANSPOSE (#6)
* redo: add convert nodes

This reverts commit 8448acd5ebf8fe86ab9d25313b64a15c811ef96e.

* align clang format with cann

* rename binary_op -> general_op

casue there're some op that will only tak 1 param

* Revert "rename binary_op -> general_op"

This reverts commit 5be63b1a0dc4614457785367dade62158fe46214.

* wip

* add GGML_OP_PERMUTE

* add GGML_OP_VIEW and GGML_OP_GET_ROWS

* wip

* Revert "wip"

This reverts commit 772462ca6cfa01ea31bde725c2da60076ad9385f.
2024-11-04 23:12:03 +08:00
nullname 4abaf7d87e
feat: fix mulmat (#2)
* ggml_qnn_op_config now manager the construction of ggml_qnn_tensor

* wip

* add interface ggml_qnn_op_config

* add ggml_qnn_list_op_config

* add create_tensor and move tensor bind to execute

* wip

* rename: ggml_qnn_list_op_config -> ggml_qnn_matmul_op_config

* add tensortype to allow native tensor

* remove ggml_tensor param at ggml_qnn_tensor::create_tensor

* postpone the tensor id allocation to add_node

* add ggml_qnn_op_config_base

* trival change to reduct the param of function

* split bind_tensors into bind_input_tensors and bind_output_tensors

* implement ggml_qnn_single_op_config::create_tensors

next will set the prameter of transpose

* tensor: add bind buffer

* add parameter tensor type

* implement add_tensor_param

* set qnn_instance only at constructor

* set transpose tensor param

* move create_op_constructor into op-config module

* create QNN_OP_MAT_MUL from ggml_qnn_matmul_op_config

* try fix crash

* fix compiling error at older ndk (r23c)

* fix crash

* fix parameter tensor name

* update tensor dimension assignment and add TODO

* fix mat_mul graph creating

* fix MUL_MAT_256x16x10x1_256x1x10x1_16x1x10x1

* append type to graph cache key

* wip

* fix supported op

* update comment

* disable op other than add and mat_mul

* add convert op to adapt multi input/output format

* disable f16 for cpu backend according to official doc

https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/cpu_backend.html#supported-operations

* add supported data types flags in each backend

* remove unused functions

* append output type to graph key

* fix gpu backend by disable the different data type op

* fix cpu backend support ops

* fix duplicated tensor name

* append op name

* suppress warning

* remove unused code
2024-10-28 12:48:16 +08:00