llama.cpp

Commit Graph

Author	SHA1	Message	Date
nullname	a1ab67478f	[feat] add more op (#35 ) * move op key generate function to kOpCaps * fix op desc print * try fix rms_norm * Revert "try fix rms_norm" This reverts commit 33b296098012909cb482fc29b52b28098dc971cd. * add quantization type support by converting them to float * enable quantization tensor for mulmat in gpu/npu * fix asan error * add log and assert * insert output convert operator after mulmat * add log * fix some error in running * disable permute again * add log * add error function * Revert "add error function" This reverts commit f92ff47798ac8053fb776c55efbb1a98469c7af1. * add log * more log * disable convert op in graph * wip * add f16 config for graph * set f16 precision for f16 graph * fix override data type * add comment * add config flag to enable quantize type * add log * more quantized type for cpu and gpu backend * enable all quant types for cpu and gpu backend * rename * wip * add log * remove unused functions * skip permute * remove get_qnn_op_input_param_count * fallback to generic_get_op_desc if no op_desc * revert 'skip permute' * Revert "revert 'skip permute'" This reverts commit 5761e31fd23c69c4cabf6fd9fac1a0d3e5a74968. * wip * add log * print qnn tensor type * add log * limit the max size of tensor * add log * fix tensor size limiter * small improve on tensor info printer * disable sqrt and div to pass test-backend-ops for 8 gen 2 * remove debug log in release build * add log * skip permute in src * wip * disable reshape * skip mul at decoder start * wip * add log * add qnn_scoped_timer * add perf tracker in graph * add cmake options GGML_QNN_ENABLE_PERFORMANCE_TRACKING * fix flag name * use milli-second * wip * fix comment string * add file for profiler * change qnn-cpu to GGML_BACKEND_DEVICE_TYPE_ACCEL, so that we can run tests on cpu * wip * profiler: refactoring * wip * add implement for print_profile_events * set-up profiler for graph * set profiler to graph execute * pretty print events * unified log print prefix * print event count * enable optrace * print duration at event end * wip * add more detailed soc information * wip * move device caps array into qnn-lib.cpp * remove lib_name in device_context * move get_graph_key_from_cgraph to graph.cpp * add override type for tensor key * use override_type instead of original data type for graph key * append op type to tensor name to fix error in qwen * remove todo * wip	2025-03-22 12:34:31 +08:00
nullname	c867641222	feat: fix some TODO item in upstream PR #26 (#27 ) * fix warning * wip * add todo for graph key generate * rename some file to meet upstream guideline * remove local .clang-format * expend supported/unsupported counter to all ops * append device name to log * port to ggml logger * fix warning after adapt to ggml logger * append \n to all log * use case op instead of convert * Revert "use case op instead of convert" This reverts commit e662fc2dfee41719aaf7bc9d75e03e8d0f7ded0f. * fix op that needs same shape * opt kQnnOpsTable * refresh params name field when getting op config * opt npu log print * remove unused functions	2025-02-27 23:16:08 +08:00
nullname	a822d00753	feat: run on win (#24 ) * move qnn_instance function implementation into cpp * wip * wip * move dl related function into separated file * use cast op for gpu * Revert "use cast op for gpu" This reverts commit 05df7362a15c022d05940d682e84cf480a082c6a. * Reapply "use cast op for gpu" This reverts commit 2520e5922a216faceb6d7efcde23dafe6947a4b3. * fix compiling error in win * fix align_alloc in win * fix compiling error * add get sys free/total mem for win * wip * suppress warning in win * add missing chrono header * set the correct qnn lib name for windows * add flag to control cpu backend * wip * wip * Revert "Reapply "use cast op for gpu"" This reverts commit f56519c374a7d46faac706cf214de48ff5fc5139. * fix compiling error for linux build * fix cdsprpc dynamic library name * wip * skip rpc load fail * fix page_align_alloc * suppress some warning in gcc * wip * reuse align to function * more log * add log and fix warning * wip * fix asan errors and memory leaks * fix the get_io_tensors_from_graph * improve comment * print GGML_QNN_DEFAULT_LIB_SEARCH_PATH * revert some unused changes * move library search path setter into qnn module * fix android library loading * skip qnn_device_get_platform_info for npu emulator	2025-02-24 10:47:47 +08:00
nullname	f2d8d017da	[feat] Port ggml graph to QNN graph (#16 ) * more log * split graph implementation into cpp file * rename: ggml_qnn_graph -> qnn_graph * add imput/output tensor to graph * fix assert * wip * add _ggml_tensor field in qnn tensor * add comments * add set_data_buffer with raw memory buffer * use set_data_buffer * op param buffer use qnn_buffer_ptr * add qnn_mem_buffer_slice * use qnn_buffer_ptr as tensor buffer * use new set_data_buffer to reduce copy * ggml_qnn_op_config: add function to set input/output tensor before init node * remove ggml_qnn_connectable_op_config and use ggml_qnn_single_op_config instead * wip * add initialize_op_nodes without tensor params * wip * add op caps table * merge kGgmlOpToQnnOp and kOpCaps tables * wip * add cache parameter to create_tensors * add init_from_ggml_graph * disable gelu for all backend * wip * move op index calc to op config module * use the ggml_tensor as parameter of build_graph * add log * use create_operation_from_op_tensor in old build_graph function * remove unused constructors * fix parameter count * remove unused member func/var * make init_from_ggml_graph as a class member: build_graph_from_ggml_graph * move graph finalize into member function `finalize()` * get graph key from ggml op tensor directly * append output type * reduce tensor key length * add function to generate key from ggml_cgraph * simplify graph cache insert and delete * remove template param at get_qnn_graph_from_cache * wip * merge kQnnUnaryOpsTable and kQnnBinaryOpsTable * refactor device_supports_op * add log * wip * use framework function to check same shape * wip * extract some logic into separated function * wip * add execution function that runs graph * add function to create qnn graph from ggml_cgraph with cache * execute graph directly * return null graph key for empty graph * add more qualcomm chipset enums * add cap for reshape * disable some ops * try to skip GGML_OP_VIEW * moew log for view tensor * append param tensor into intermedia tensor key * use 'ordered' set * fix warning in release * wip	2025-01-10 11:13:25 +08:00
nullname	e36ad89528	bugfix: error pre-allocated tensor (k_cache_view-0) (#12 ) * fix device binding at ggml_backend_qnn_buffer_type * merge ggml_backend_qnn_buffer_context and qnn_mem_buffer * wip * add log * wip * add qnn_buffer_ptr * remove tailing `\n` at log * add log * enable GGML_OP_NONE * wip * wip * disable tensor with view * wip * wip * more log for view tensor * re-enable view * wip * remove link android lib * set dimension at bind function * move graph traversal to backend-ops * wip * add get_view_internal_dimension to obtain the tensor view source dimension * use _view_source_dimensions to allocate qnn tensor * add place holder function ggml_backend_qnn_cpy_tensor_async * add ggml_qnn_aggregate_op_config * make matmul based on ggml_qnn_aggregate_op_config * wip * manually specify the order of op destruct * skip register qnn-cpu backend * disable view op again * remove _view_source_dimensions * add nop for reshape and view ops * add log * add comment	2024-12-11 10:42:00 +08:00
nullname	4abaf7d87e	feat: fix mulmat (#2 ) * ggml_qnn_op_config now manager the construction of ggml_qnn_tensor * wip * add interface ggml_qnn_op_config * add ggml_qnn_list_op_config * add create_tensor and move tensor bind to execute * wip * rename: ggml_qnn_list_op_config -> ggml_qnn_matmul_op_config * add tensortype to allow native tensor * remove ggml_tensor param at ggml_qnn_tensor::create_tensor * postpone the tensor id allocation to add_node * add ggml_qnn_op_config_base * trival change to reduct the param of function * split bind_tensors into bind_input_tensors and bind_output_tensors * implement ggml_qnn_single_op_config::create_tensors next will set the prameter of transpose * tensor: add bind buffer * add parameter tensor type * implement add_tensor_param * set qnn_instance only at constructor * set transpose tensor param * move create_op_constructor into op-config module * create QNN_OP_MAT_MUL from ggml_qnn_matmul_op_config * try fix crash * fix compiling error at older ndk (r23c) * fix crash * fix parameter tensor name * update tensor dimension assignment and add TODO * fix mat_mul graph creating * fix MUL_MAT_256x16x10x1_256x1x10x1_16x1x10x1 * append type to graph cache key * wip * fix supported op * update comment * disable op other than add and mat_mul * add convert op to adapt multi input/output format * disable f16 for cpu backend according to official doc https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/cpu_backend.html#supported-operations * add supported data types flags in each backend * remove unused functions * append output type to graph key * fix gpu backend by disable the different data type op * fix cpu backend support ops * fix duplicated tensor name * append op name * suppress warning * remove unused code	2024-10-28 12:48:16 +08:00
hongruichen	2c73791d62	refactoring: remove dup code	2024-07-27 10:48:09 +08:00
hongruichen	ee305cc171	refactoring: split qnn rpc buffer into dedicated class	2024-07-26 22:52:23 +08:00

8 Commits