llama.cpp

Commit Graph

Author	SHA1	Message	Date
nullname	e6dbdacc32	feat: fix llama-bench (#7 ) * remove unused functions * wip * init from last devices * move init into constructor * wip * add static assert to device table * make kDeviceCaps as constexpr * get free memory and total memory * add optimize flag for qnn backend	2024-11-13 17:06:46 +08:00
nullname	4abaf7d87e	feat: fix mulmat (#2 ) * ggml_qnn_op_config now manager the construction of ggml_qnn_tensor * wip * add interface ggml_qnn_op_config * add ggml_qnn_list_op_config * add create_tensor and move tensor bind to execute * wip * rename: ggml_qnn_list_op_config -> ggml_qnn_matmul_op_config * add tensortype to allow native tensor * remove ggml_tensor param at ggml_qnn_tensor::create_tensor * postpone the tensor id allocation to add_node * add ggml_qnn_op_config_base * trival change to reduct the param of function * split bind_tensors into bind_input_tensors and bind_output_tensors * implement ggml_qnn_single_op_config::create_tensors next will set the prameter of transpose * tensor: add bind buffer * add parameter tensor type * implement add_tensor_param * set qnn_instance only at constructor * set transpose tensor param * move create_op_constructor into op-config module * create QNN_OP_MAT_MUL from ggml_qnn_matmul_op_config * try fix crash * fix compiling error at older ndk (r23c) * fix crash * fix parameter tensor name * update tensor dimension assignment and add TODO * fix mat_mul graph creating * fix MUL_MAT_256x16x10x1_256x1x10x1_16x1x10x1 * append type to graph cache key * wip * fix supported op * update comment * disable op other than add and mat_mul * add convert op to adapt multi input/output format * disable f16 for cpu backend according to official doc https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/cpu_backend.html#supported-operations * add supported data types flags in each backend * remove unused functions * append output type to graph key * fix gpu backend by disable the different data type op * fix cpu backend support ops * fix duplicated tensor name * append op name * suppress warning * remove unused code	2024-10-28 12:48:16 +08:00
hongruichen	867c91bfaf	feat: add error string for QnnOpPackage_Error_t	2024-07-27 13:24:57 +08:00
hongruichen	27299463ae	fix: try fix tensor type error	2024-07-20 15:13:10 +08:00
hongruichen	1679dcf47e	fix: check all dimentions in `can offload`	2024-07-20 13:29:01 +08:00
hongruichen	b1b5cc10b1	add function to convert qnn error into string	2024-07-19 22:51:17 +08:00
hongruichen	bb13795dce	refactoring: remove unused functions and variables	2024-07-17 14:17:35 +08:00
hongruichen	63dc587dff	refactoring: make the buffer alloc and free stay in same class	2024-07-17 14:08:31 +08:00
hongruichen	4b0f6b0cd6	add helper function to get Qnn_TensorType_t from ggml_tensor	2024-07-05 19:37:58 +08:00
hongruichen	0f2e68713c	move tensor related function to utils	2024-07-05 19:02:38 +08:00
hongruichen	000240cf62	add clang format file and reformating	2024-07-04 23:29:31 +08:00
hongruichen	8b677d1b2f	move qnn backend into sub folder	2024-07-02 19:42:14 +08:00

12 Commits