llama.cpp

Commit Graph

Author	SHA1	Message	Date
nullname	e36ad89528	bugfix: error pre-allocated tensor (k_cache_view-0) (#12 ) * fix device binding at ggml_backend_qnn_buffer_type * merge ggml_backend_qnn_buffer_context and qnn_mem_buffer * wip * add log * wip * add qnn_buffer_ptr * remove tailing `\n` at log * add log * enable GGML_OP_NONE * wip * wip * disable tensor with view * wip * wip * more log for view tensor * re-enable view * wip * remove link android lib * set dimension at bind function * move graph traversal to backend-ops * wip * add get_view_internal_dimension to obtain the tensor view source dimension * use _view_source_dimensions to allocate qnn tensor * add place holder function ggml_backend_qnn_cpy_tensor_async * add ggml_qnn_aggregate_op_config * make matmul based on ggml_qnn_aggregate_op_config * wip * manually specify the order of op destruct * skip register qnn-cpu backend * disable view op again * remove _view_source_dimensions * add nop for reshape and view ops * add log * add comment	2024-12-11 10:42:00 +08:00
hongruichen	0d02ee09ed	fix int overflow and remove view op to pass unit test	2024-12-03 10:55:11 +08:00
hongruichen	c5e6549331	fix: fix assertion	2024-11-29 23:38:06 +08:00
hongruichen	09efaa389e	define compile flag as module private	2024-11-29 17:24:05 +08:00
hongruichen	6d4feae579	redo conflict changes	2024-11-29 17:14:01 +08:00
hongruichen	5103b166ba	bugfix: block large tensor calc in npu	2024-11-29 14:19:34 +08:00
nullname	a2df09b6af	[WIP] feat: perf opt (#10 ) * reduce log * wip * add function to create concat nodes * opt * insert concat node before mulmat * use resize op * wip * add bind_buffer and remov ggml prefix in tensor types * use gather node instead * fix tensor type, now succeed in gpu and cpu, failed in npu * add comment * wip * add comment * wip * in destructor, clear internal buffer before unbind * disable gather for npu * wip * count swap memory as free memory * wip * fix supported_types ggml_backend_device_i.supports_op will be invoked before ggml_backend_device_i.init_backend * rename create_tensors -> initialize_op_nodes * move ggml_qnn_op_config to deparated file * wip * add create_convert_nodes * add comment * enable different type in/out for npu and cpu backend * fix npu convert op * enlarge max buffer size * add more error code * check tensor type before create convert node * add log * add log * remove transpose0 and use buildin transpose flag * rename transpose1 -> transpose_out * disable convert for npu * add more logs	2024-11-29 00:03:23 +08:00
nullname	e6dbdacc32	feat: fix llama-bench (#7 ) * remove unused functions * wip * init from last devices * move init into constructor * wip * add static assert to device table * make kDeviceCaps as constexpr * get free memory and total memory * add optimize flag for qnn backend	2024-11-13 17:06:46 +08:00
nullname	8ad86dc703	feat: add QNN_OP_TRANSPOSE (#6 ) * redo: add convert nodes This reverts commit 8448acd5ebf8fe86ab9d25313b64a15c811ef96e. * align clang format with cann * rename binary_op -> general_op casue there're some op that will only tak 1 param * Revert "rename binary_op -> general_op" This reverts commit 5be63b1a0dc4614457785367dade62158fe46214. * wip * add GGML_OP_PERMUTE * add GGML_OP_VIEW and GGML_OP_GET_ROWS * wip * Revert "wip" This reverts commit 772462ca6cfa01ea31bde725c2da60076ad9385f.	2024-11-04 23:12:03 +08:00
nullname	fe565cfd9f	fix compiling error in release	2024-10-29 15:47:07 +08:00
hongruichen	5c1e6d4905	disable gelu in NPU	2024-10-29 00:54:08 +08:00
nullname	4abaf7d87e	feat: fix mulmat (#2 ) * ggml_qnn_op_config now manager the construction of ggml_qnn_tensor * wip * add interface ggml_qnn_op_config * add ggml_qnn_list_op_config * add create_tensor and move tensor bind to execute * wip * rename: ggml_qnn_list_op_config -> ggml_qnn_matmul_op_config * add tensortype to allow native tensor * remove ggml_tensor param at ggml_qnn_tensor::create_tensor * postpone the tensor id allocation to add_node * add ggml_qnn_op_config_base * trival change to reduct the param of function * split bind_tensors into bind_input_tensors and bind_output_tensors * implement ggml_qnn_single_op_config::create_tensors next will set the prameter of transpose * tensor: add bind buffer * add parameter tensor type * implement add_tensor_param * set qnn_instance only at constructor * set transpose tensor param * move create_op_constructor into op-config module * create QNN_OP_MAT_MUL from ggml_qnn_matmul_op_config * try fix crash * fix compiling error at older ndk (r23c) * fix crash * fix parameter tensor name * update tensor dimension assignment and add TODO * fix mat_mul graph creating * fix MUL_MAT_256x16x10x1_256x1x10x1_16x1x10x1 * append type to graph cache key * wip * fix supported op * update comment * disable op other than add and mat_mul * add convert op to adapt multi input/output format * disable f16 for cpu backend according to official doc https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/cpu_backend.html#supported-operations * add supported data types flags in each backend * remove unused functions * append output type to graph key * fix gpu backend by disable the different data type op * fix cpu backend support ops * fix duplicated tensor name * append op name * suppress warning * remove unused code	2024-10-28 12:48:16 +08:00
hongruichen	181cf52888	adapt new register backend interface and fix missing ops	2024-10-11 10:17:50 +08:00
hongruichen	1da8a3e678	fix compiling error after merge	2024-09-30 10:37:23 +08:00
Hongrui Chen	a1ceaae4ad	fix compiling error at older ndk (r23c)	2024-09-30 10:18:12 +08:00
hongruichen	481cb3a0c5	fix compiling error	2024-09-07 12:29:26 +08:00
みゃん	dedadf2a20	Fixed a bug where debug code was included in the release, resulting i… (#1 ) * Fixed a bug where debug code was included in the release, resulting in an undefined function error. * Change the path of the QNN library when building in termux environment * Revert "Change the path of the QNN library when building in termux environment" This reverts commit c6e26a3679da2608940e2163e090adf75d667400. * Changed so that GGML_QNN_DEFAULT_LIB_SEARCH_PATH can be set from command line arguments	2024-08-20 10:20:23 +08:00
hongruichen	47f6e02eda	fix: try fix the tensor rank of mul mat	2024-07-31 23:54:07 +08:00
hongruichen	74eb05a13b	feat: add ggml_qnn_op_config for handle different op	2024-07-31 20:22:37 +08:00
hongruichen	9a5f802bb6	refactoring: add convient macro to disable copy and move of class	2024-07-29 22:18:48 +08:00
hongruichen	6da82947df	refactoring: set the default qnn lib search path at CMakeLists.txt by GGML_QNN_DEFAULT_LIB_SEARCH_PATH	2024-07-29 15:53:14 +08:00
hongruichen	1f9d2a7e22	refactoring: improve tensor print	2024-07-28 22:05:51 +08:00
hongruichen	e33b5c9837	refactoring: print the name of unsupport op	2024-07-27 13:49:49 +08:00
hongruichen	8ab1f15fe3	refactoring: remove internal functions, use op table directly	2024-07-27 13:43:07 +08:00
hongruichen	e0c9b34016	feat: check if dims equal for add looks qnn add can only applied to matrix with equal dimensions	2024-07-27 13:38:12 +08:00
hongruichen	5da73f8085	refactoring: move forward and supports_op into ops file	2024-07-27 13:24:57 +08:00
hongruichen	867c91bfaf	feat: add error string for QnnOpPackage_Error_t	2024-07-27 13:24:57 +08:00
hongruichen	ccfec70106	refactoring: remove unused get_rpcmem_from_memhandle func	2024-07-27 13:24:57 +08:00
hongruichen	2c73791d62	refactoring: remove dup code	2024-07-27 10:48:09 +08:00
hongruichen	18aa6654d5	refactoring: opt graph key gen	2024-07-27 10:39:07 +08:00
hongruichen	47735cb589	fix: try fix error in 2nd run by appending dimension into graph key	2024-07-26 23:04:53 +08:00
hongruichen	ee305cc171	refactoring: split qnn rpc buffer into dedicated class	2024-07-26 22:52:23 +08:00
hongruichen	f843e5aaf5	fix: 1.free up rpc memory at destruct 2. unbind tesnsor	2024-07-25 23:45:04 +08:00
hongruichen	706793f078	fix: back to qnn tensor v1 to fix the create tensor error	2024-07-22 23:08:38 +08:00
hongruichen	3b47056c97	refactoring: change the tensor binding mode between qnn tensor and ggml tensor	2024-07-22 23:08:38 +08:00
hongruichen	b173c4e061	feat: update tensor name when bind to graph	2024-07-20 17:31:40 +08:00
hongruichen	5f3b1ae3b0	fix: try fix graph cache with append the tensors name	2024-07-20 16:39:06 +08:00
hongruichen	51f95d6980	fix: dimension could be wrong for tensor liked 1x1x8	2024-07-20 16:11:35 +08:00
hongruichen	27299463ae	fix: try fix tensor type error	2024-07-20 15:13:10 +08:00
hongruichen	28a00e5e6c	fix: try fix QNN_GRAPH_ERROR_INVALID_OP_CONFIG	2024-07-20 14:11:58 +08:00
hongruichen	1679dcf47e	fix: check all dimentions in `can offload`	2024-07-20 13:29:01 +08:00
hongruichen	b1b5cc10b1	add function to convert qnn error into string	2024-07-19 22:51:17 +08:00
hongruichen	a607995f95	Reapply "tried fix the add node error 6005" This reverts commit `f45fbec8f4`.	2024-07-19 15:35:55 +08:00
hongruichen	f45fbec8f4	Revert "tried fix the add node error 6005" This reverts commit `ce3d09e5f2`.	2024-07-19 12:59:38 +08:00
hongruichen	ce3d09e5f2	tried fix the add node error 6005	2024-07-19 12:59:21 +08:00
hongruichen	15f5cc450c	bug: fix allocation size overflow at log	2024-07-18 19:44:05 +08:00
hongruichen	d82b3a0bdb	feat: add GGML_UNARY_OP_GELU	2024-07-18 11:15:48 +08:00
hongruichen	ce199b2de7	refactoring: downgrade some log to debug level	2024-07-17 23:49:47 +08:00
hongruichen	c76fc9aa2f	fix warnings	2024-07-17 23:32:13 +08:00
hongruichen	6457a68bd7	disable qnn profiling in release build	2024-07-17 23:24:29 +08:00

1 2

84 Commits