llama.cpp/ggml/src
Alberto Cabrera Pérez 17512a94d6
sycl : implementation of reordered Q4_0 MMVQ for Intel GPUs (#12858)
* sycl : Implemented reorder Q4_0 mmvq

Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com>

* sycl : Fixed mmvq being called when reorder is disabled

* sycl : Improved comments in the quants header

Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com>

* Use static_assert

* safe_div -> ceil_div

* Clarify qi comment

* change the reorder tensor from init to execute OP

* dbg

* Undo changes to test-backend-ops

* Refactor changes on top of q4_0 reorder fix

* Missing Reverts

* Refactored opt_for_reorder logic to simplify code path

* Explicit inlining and unroll

* Renamed mul_mat_algo enum for consistency

---------

Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com>
Co-authored-by: romain.biessy <romain.biessy@codeplay.com>
2025-05-09 16:34:08 +01:00
..
ggml-blas ggml : add support for dynamic loading of backends (#10469) 2024-11-25 15:13:39 +01:00
ggml-cann CANN: Add support for async operator submission (#12864) 2025-04-17 20:34:16 +08:00
ggml-cpu whisper: remove MSVC warnings pragmas (whisper/3090) 2025-05-07 17:28:36 +03:00
ggml-cuda CUDA: FA support for Deepseek (Ampere or newer) (#13306) 2025-05-09 13:34:58 +02:00
ggml-hip CUDA/HIP: Share the same unified memory allocation logic. (#12934) 2025-04-15 11:20:38 +02:00
ggml-kompute llama : add Qwen2VL support + multimodal RoPE (#10361) 2024-12-14 14:43:46 +02:00
ggml-metal metal : optimize MoE for large batches (#13388) 2025-05-09 15:14:56 +03:00
ggml-musa cuda : enable CUDA Graph on CUDA Toolkit < 12.x (#12394) 2025-03-17 20:25:13 +02:00
ggml-opencl opencl: fix incorrect local_size index in profiling log (#12868) 2025-04-16 14:25:57 -07:00
ggml-rpc rpc : add rpc_msg_set_tensor_hash_req (#13353) 2025-05-09 10:31:07 +03:00
ggml-sycl sycl : implementation of reordered Q4_0 MMVQ for Intel GPUs (#12858) 2025-05-09 16:34:08 +01:00
ggml-vulkan vulkan: Allow up to 4096 elements for mul_mat_id row_ids (#13326) 2025-05-09 09:23:41 +02:00
CMakeLists.txt cmake : removed stdc++fs (whisper/3097) 2025-05-07 17:28:36 +03:00
ggml-alloc.c ggml: Don't assert fail when tensor data changes (#13222) 2025-05-01 22:46:10 +02:00
ggml-backend-impl.h ggml : upgrade init_tensor API to return a ggml_status (#11854) 2025-02-28 14:41:47 +01:00
ggml-backend-reg.cpp ggml-backend : fix backend search path (#12330) 2025-03-11 14:25:17 +01:00
ggml-backend.cpp CUDA: fix logic for clearing padding with -ngl 0 (#13320) 2025-05-05 22:32:13 +02:00
ggml-common.h musa: fix all warnings, re-enable `-DLLAMA_FATAL_WARNINGS=ON` in ci and update doc (#12611) 2025-03-30 10:59:38 +02:00
ggml-impl.h ggml: don't include arm_neon.h when using CUDA 12 with ARM Neon (ggml/1187) 2025-04-11 00:17:47 +03:00
ggml-opt.cpp ggml-opt: fix data corruption (ggml/1022) 2024-11-21 09:22:02 +02:00
ggml-quants.c whisper: remove MSVC warnings pragmas (whisper/3090) 2025-05-07 17:28:36 +03:00
ggml-quants.h ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml-threading.cpp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml-threading.h remove CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS (#10797) 2024-12-12 19:02:49 +01:00
ggml.c metal : optimize MoE for large batches (#13388) 2025-05-09 15:14:56 +03:00
gguf.cpp Fix clang warning in gguf_check_reserved_keys (#12686) 2025-04-01 13:12:53 +02:00