Commit Graph

6 Commits

Author SHA1 Message Date
nullname a1ab67478f
[feat] add more op (#35)
* move op key generate function to kOpCaps

* fix op desc print

* try fix rms_norm

* Revert "try fix rms_norm"

This reverts commit 33b296098012909cb482fc29b52b28098dc971cd.

* add quantization type support by converting them to float

* enable quantization tensor for mulmat in gpu/npu

* fix asan error

* add log and assert

* insert output convert operator after mulmat

* add log

* fix some error in running

* disable permute again

* add log

* add error function

* Revert "add error function"

This reverts commit f92ff47798ac8053fb776c55efbb1a98469c7af1.

* add log

* more log

* disable convert op in graph

* wip

* add f16 config for graph

* set f16 precision for f16 graph

* fix override data type

* add comment

* add config flag to enable quantize type

* add log

* more quantized type for cpu and gpu backend

* enable all quant types for cpu and gpu backend

* rename

* wip

* add log

* remove unused functions

* skip permute

* remove get_qnn_op_input_param_count

* fallback to generic_get_op_desc if no op_desc

* revert 'skip permute'

* Revert "revert 'skip permute'"

This reverts commit 5761e31fd23c69c4cabf6fd9fac1a0d3e5a74968.

* wip

* add log

* print qnn tensor type

* add log

* limit the max size of tensor

* add log

* fix tensor size limiter

* small improve on tensor info printer

* disable sqrt and div to pass test-backend-ops for 8 gen 2

* remove debug log in release build

* add log

* skip permute in src

* wip

* disable reshape

* skip mul at decoder start

* wip

* add log

* add qnn_scoped_timer

* add perf tracker in graph

* add cmake options GGML_QNN_ENABLE_PERFORMANCE_TRACKING

* fix flag name

* use milli-second

* wip

* fix comment string

* add file for profiler

* change qnn-cpu to GGML_BACKEND_DEVICE_TYPE_ACCEL, so that we can run tests on cpu

* wip

* profiler: refactoring

* wip

* add implement for print_profile_events

* set-up profiler for graph

* set profiler to graph execute

* pretty print events

* unified log print prefix

* print event count

* enable optrace

* print duration at event end

* wip

* add more detailed soc information

* wip

* move device caps array into qnn-lib.cpp

* remove lib_name in device_context

* move get_graph_key_from_cgraph to graph.cpp

* add override type for tensor key

* use override_type instead of original data type for graph key

* append op type to tensor name to fix error in qwen

* remove todo

* wip
2025-03-22 12:34:31 +08:00
nullname c867641222
feat: fix some TODO item in upstream PR #26 (#27)
* fix warning

* wip

* add todo for graph key generate

* rename some file to meet upstream guideline

* remove local .clang-format

* expend supported/unsupported counter to all ops

* append device name to log

* port to ggml logger

* fix warning after adapt to ggml logger

* append \n to all log

* use case op instead of convert

* Revert "use case op instead of convert"

This reverts commit e662fc2dfee41719aaf7bc9d75e03e8d0f7ded0f.

* fix op that needs same shape

* opt kQnnOpsTable

* refresh params name field when getting op config

* opt npu log print

* remove unused functions
2025-02-27 23:16:08 +08:00
nullname a822d00753
feat: run on win (#24)
* move qnn_instance function implementation into cpp

* wip

* wip

* move dl related function into separated file

* use cast op for gpu

* Revert "use cast op for gpu"

This reverts commit 05df7362a15c022d05940d682e84cf480a082c6a.

* Reapply "use cast op for gpu"

This reverts commit 2520e5922a216faceb6d7efcde23dafe6947a4b3.

* fix compiling error in win

* fix align_alloc in win

* fix compiling error

* add get sys free/total mem for win

* wip

* suppress warning in win

* add missing chrono header

* set the correct qnn lib name for windows

* add flag to control cpu backend

* wip

* wip

* Revert "Reapply "use cast op for gpu""

This reverts commit f56519c374a7d46faac706cf214de48ff5fc5139.

* fix compiling error for linux build

* fix cdsprpc dynamic library name

* wip

* skip rpc load fail

* fix page_align_alloc

* suppress some warning in gcc

* wip

* reuse align to function

* more log

* add log and fix warning

* wip

* fix asan errors and memory leaks

* fix the get_io_tensors_from_graph

* improve comment

* print GGML_QNN_DEFAULT_LIB_SEARCH_PATH

* revert some unused changes

* move library search path setter into qnn module

* fix android library loading

* skip qnn_device_get_platform_info for npu emulator
2025-02-24 10:47:47 +08:00
nullname 10bd671c08
[feat]add more op support (#18)
* disable rpc buffer for npu

* append input/output tensor size into unsupported op log

* log dimensions for unsupported tensor

* wip

* split op config classes into separated file

* fix reshape

* wip

* add op_constructor_with_type_param

* set parameter for op_constructor_with_type_param func
2025-01-18 22:15:27 +08:00
hongruichen 5f93376f67 fix compiling error after merged 2025-01-10 11:30:03 +08:00
nullname f2d8d017da
[feat] Port ggml graph to QNN graph (#16)
* more log

* split graph implementation into cpp file

* rename: ggml_qnn_graph -> qnn_graph

* add imput/output tensor to graph

* fix assert

* wip

* add _ggml_tensor field in qnn tensor

* add comments

* add set_data_buffer with raw memory buffer

* use set_data_buffer

* op param buffer use qnn_buffer_ptr

* add qnn_mem_buffer_slice

* use qnn_buffer_ptr as tensor buffer

* use new set_data_buffer to reduce copy

* ggml_qnn_op_config: add function to set input/output tensor before init node

* remove ggml_qnn_connectable_op_config and use ggml_qnn_single_op_config instead

* wip

* add initialize_op_nodes without tensor params

* wip

* add op caps table

* merge kGgmlOpToQnnOp and kOpCaps tables

* wip

* add cache parameter to create_tensors

* add init_from_ggml_graph

* disable gelu for all backend

* wip

* move op index calc to op config module

* use the ggml_tensor as parameter of build_graph

* add log

* use create_operation_from_op_tensor in old build_graph function

* remove unused constructors

* fix parameter count

* remove unused member func/var

* make init_from_ggml_graph as a class member: build_graph_from_ggml_graph

* move graph finalize into member function `finalize()`

* get graph key from ggml op tensor directly

* append output type

* reduce tensor key length

* add function to generate key from ggml_cgraph

* simplify graph cache insert and delete

* remove template param at get_qnn_graph_from_cache

* wip

* merge kQnnUnaryOpsTable and kQnnBinaryOpsTable

* refactor device_supports_op

* add log

* wip

* use framework function to check same shape

* wip

* extract some logic into separated function

* wip

* add execution function that runs graph

* add function to create qnn graph from ggml_cgraph with cache

* execute graph directly

* return null graph key for empty graph

* add more qualcomm chipset enums

* add cap for reshape

* disable some ops

* try to skip GGML_OP_VIEW

* moew log for view tensor

* append param tensor into intermedia tensor key

* use 'ordered' set

* fix warning in release

* wip
2025-01-10 11:13:25 +08:00