llama.cpp

Commit Graph

Author	SHA1	Message	Date
nullname	a2df09b6af	[WIP] feat: perf opt (#10 ) * reduce log * wip * add function to create concat nodes * opt * insert concat node before mulmat * use resize op * wip * add bind_buffer and remov ggml prefix in tensor types * use gather node instead * fix tensor type, now succeed in gpu and cpu, failed in npu * add comment * wip * add comment * wip * in destructor, clear internal buffer before unbind * disable gather for npu * wip * count swap memory as free memory * wip * fix supported_types ggml_backend_device_i.supports_op will be invoked before ggml_backend_device_i.init_backend * rename create_tensors -> initialize_op_nodes * move ggml_qnn_op_config to deparated file * wip * add create_convert_nodes * add comment * enable different type in/out for npu and cpu backend * fix npu convert op * enlarge max buffer size * add more error code * check tensor type before create convert node * add log * add log * remove transpose0 and use buildin transpose flag * rename transpose1 -> transpose_out * disable convert for npu * add more logs	2024-11-29 00:03:23 +08:00
nullname	e6dbdacc32	feat: fix llama-bench (#7 ) * remove unused functions * wip * init from last devices * move init into constructor * wip * add static assert to device table * make kDeviceCaps as constexpr * get free memory and total memory * add optimize flag for qnn backend	2024-11-13 17:06:46 +08:00
nullname	4abaf7d87e	feat: fix mulmat (#2 ) * ggml_qnn_op_config now manager the construction of ggml_qnn_tensor * wip * add interface ggml_qnn_op_config * add ggml_qnn_list_op_config * add create_tensor and move tensor bind to execute * wip * rename: ggml_qnn_list_op_config -> ggml_qnn_matmul_op_config * add tensortype to allow native tensor * remove ggml_tensor param at ggml_qnn_tensor::create_tensor * postpone the tensor id allocation to add_node * add ggml_qnn_op_config_base * trival change to reduct the param of function * split bind_tensors into bind_input_tensors and bind_output_tensors * implement ggml_qnn_single_op_config::create_tensors next will set the prameter of transpose * tensor: add bind buffer * add parameter tensor type * implement add_tensor_param * set qnn_instance only at constructor * set transpose tensor param * move create_op_constructor into op-config module * create QNN_OP_MAT_MUL from ggml_qnn_matmul_op_config * try fix crash * fix compiling error at older ndk (r23c) * fix crash * fix parameter tensor name * update tensor dimension assignment and add TODO * fix mat_mul graph creating * fix MUL_MAT_256x16x10x1_256x1x10x1_16x1x10x1 * append type to graph cache key * wip * fix supported op * update comment * disable op other than add and mat_mul * add convert op to adapt multi input/output format * disable f16 for cpu backend according to official doc https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/cpu_backend.html#supported-operations * add supported data types flags in each backend * remove unused functions * append output type to graph key * fix gpu backend by disable the different data type op * fix cpu backend support ops * fix duplicated tensor name * append op name * suppress warning * remove unused code	2024-10-28 12:48:16 +08:00
hongruichen	181cf52888	adapt new register backend interface and fix missing ops	2024-10-11 10:17:50 +08:00
hongruichen	3b47056c97	refactoring: change the tensor binding mode between qnn tensor and ggml tensor	2024-07-22 23:08:38 +08:00
hongruichen	0301b500cd	refactoring: prevent leak the QNN_INTERFACE_VER_TYPE and QNN_SYSTEM_INTERFACE_VER_TYPE outside of qnn.hpp	2024-07-17 00:18:38 +08:00
hongruichen	100ccd5e7f	add unary op template and more ops	2024-07-13 00:55:34 +08:00
Hongrui Chen	5f2e3918f6	refactoring ggml_qnn_tensor	2024-07-09 19:58:46 +08:00
hongruichen	13dc3a02c3	use qnn graph inside add and mul ops	2024-07-05 13:27:16 +08:00
hongruichen	4b2ee61f62	move graph map to backend object	2024-07-05 11:58:47 +08:00
hongruichen	000240cf62	add clang format file and reformating	2024-07-04 23:29:31 +08:00
hongruichen	8b677d1b2f	move qnn backend into sub folder	2024-07-02 19:42:14 +08:00

12 Commits