llama.cpp

Author	SHA1	Message	Date
zhou.weiguo	5269e082aa	ggml-qnn: refine ggml inference using QNN NPU	2024-06-11 23:05:00 +08:00
zhou.weiguo	5f8cfe4a1e	ggml-qnn: refine source code of ggml-qnn.cpp to make reviewer more happy	2024-06-10 20:07:26 +08:00
zhou.weiguo	d38d4a67d1	npu: probe htp info and capacity of rpc ion memory	2024-06-09 23:49:54 +08:00
zhou.weiguo	3e8b61f970	review: fix a memory leak introduced by review modification which explained in https://github.com/zhouwg/llama.cpp/pull/1	2024-06-09 09:06:44 +08:00
zhou.weiguo	fdf0272dfb	review: code format using clang-format + manually modification according to review comments	2024-06-08 17:56:32 +08:00
zhou.weiguo	5d691c6cd0	review: put qnn's internal log inside preprocessor diretive	2024-06-08 09:22:39 +08:00
zhou.weiguo	94ee775058	review: remove static global vars to support multi-instance simultaneously and thread safe	2024-06-07 14:56:07 +08:00
zhou.weiguo	2fab33d825	ggml-qnn: remove static global vars to support multi-instance simultaneously	2024-06-07 12:51:04 +08:00
zhou.weiguo	f4c53037ab	review: remove unused QNN helper functions	2024-06-06 20:24:03 +08:00
zhou.weiguo	dd29834c11	add supportive of quantize data type Q8_0	2024-06-06 17:12:28 +08:00
zhou.weiguo	926a8661f3	review: replace external declaration with NDK header file	2024-06-05 21:10:59 +08:00
zhou.weiguo	d325088dbf	ggml: add Qualcomm QNN(Qualcomm Neural Network,aka Qualcomm AI Engine Direct) backend	2024-06-05 10:55:45 +08:00