Commit Graph

4 Commits

Author SHA1 Message Date
Georgi Gerganov 102cd98074 ggml : Q4_3c using 2x "Full range" approach 2023-04-23 14:56:44 +03:00
slaren 50cb666b8a
Improve cuBLAS performance by using a memory pool (#1094)
* Improve cuBLAS performance by using a memory pool

* Move cuda specific definitions to ggml-cuda.h/cu

* Add CXX flags to nvcc

* Change memory pool synchronization mechanism to a spin lock
General code cleanup
2023-04-21 21:59:17 +02:00
slaren 2005469ea1
Add Q4_3 support to cuBLAS (#1086) 2023-04-20 20:49:53 +02:00
slaren 02d6988121
Improve cuBLAS performance by dequantizing on the GPU (#1065) 2023-04-20 03:14:14 +02:00