llama.cpp/ggml/src/ggml-vulkan
Rémy Oudompheng 66ee4f297c
vulkan: implement initial support for IQ2 and IQ3 quantizations (#11360)
* vulkan: initial support for IQ3_S

* vulkan: initial support for IQ3_XXS

* vulkan: initial support for IQ2_XXS

* vulkan: initial support for IQ2_XS

* vulkan: optimize Q3_K by removing branches

* vulkan: implement dequantize variants for coopmat2

* vulkan: initial support for IQ2_S

* vulkan: vertically realign code

* port failing dequant callbacks from mul_mm

* Fix array length mismatches

* vulkan: avoid using workgroup size before it is referenced

* tests: increase timeout for Vulkan llvmpipe backend

---------

Co-authored-by: Jeff Bolz <jbolz@nvidia.com>
2025-01-29 18:29:39 +01:00
..
cmake fix: ggml: fix vulkan-shaders-gen build (#10448) 2025-01-15 14:17:42 +01:00
vulkan-shaders vulkan: implement initial support for IQ2 and IQ3 quantizations (#11360) 2025-01-29 18:29:39 +01:00
CMakeLists.txt fix: ggml: fix vulkan-shaders-gen build (#10448) 2025-01-15 14:17:42 +01:00
ggml-vulkan.cpp vulkan: implement initial support for IQ2 and IQ3 quantizations (#11360) 2025-01-29 18:29:39 +01:00