llama.cpp

Commit Graph

Author	SHA1	Message	Date
Neo Zhang	7d2add51d8	sycl : support to malloc memory on device more than 4GB, update the doc and script (#17566 ) Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>	2025-11-29 14:59:44 +02:00
Neo Zhang Jianyu	704d90c987	Revert "sycl: add usage of enqueue_functions extension (#14244 )" (#15910 ) * Revert "sycl: add usage of enqueue_functions extension (#14244)" This reverts commit `8308f98c7f`. * fix missed revert code, format the code	2025-09-12 09:15:12 +08:00
Akarshan Biswas	cd1fce6d4f	SYCL: Add set_rows support for quantized types (#14883 ) * SYCL: Add set_rows support for quantized types This commit adds support for GGML_OP_SET_ROWS operation for various quantized tensor types (Q8_0, Q5_1, Q5_0, Q4_1, Q4_0, IQ4_NL) and BF16 type in the SYCL backend. The quantization/dequantization copy kernels were moved from cpy.cpp to cpy.hpp to make them available for set_rows.cpp. This addresses part of the TODOs mentioned in the code. * Use get_global_linear_id() instead ggml-ci * Fix formatting ggml-ci * Use const for ne11 and size_t variables in set_rows_sycl_q ggml-ci * Increase block size for q kernel to 256 ggml-ci * Cleanup imports * Add float.h to cpy.hpp	2025-07-28 20:32:15 +05:30
Nicolò Scipione	8308f98c7f	sycl: add usage of enqueue_functions extension (#14244 ) * Add header and namespace to use enqueue_functions extension * Convert submit and parallel_for to use new extension in convert.cpp * Convert submit and parallel_for to use extension in ggml-sycl.cpp * Convert submit and parallel_for to use extension in gla.cpp * Convert submit and parallel_for in mmq.cpp * Convert submit and parallel_for in mmvq.cpp * Convert submit and parallel_for in remaining files * Convert all simple parallel_for to nd_launch from enqueue_functions extension * Wrapping extension in general function Create a general function that enable the enqueue_functions extension if it is enable in the compiler, otherwise call the general SYCL function to launch kernels. --------- Signed-off-by: nscipione <nicolo.scipione@codeplay.com>	2025-06-20 15:07:21 +02:00
Anton Mitkov	0889eba570	sycl: Adding additional cpy dbg print output (#14034 )	2025-06-13 08:51:39 +01:00
Akarshan Biswas	228f34c9ce	SYCL: Implement few same quantized type copy kernels (#13739 ) * SYCL: Implement few same quantized type copy kernels * Use memcpy for copying contiguous tensors ggml-ci * feat(sycl): add contiguous tensor copy support and device checks Adds a memcpy path for contiguous tensors of the same type to optimize data transfer. Updates device support checks to recognize contiguous tensor operations, improving compatibility and performance. * refactor: replace specific block copy functions with template The changes replace multiple redundant block copy functions (e.g., cpy_block_q8_0_q8_0, cpy_block_q5_0_q5_0) with a single templated function cpy_blck_q_q. This reduces code duplication by using a generic template that works for any block type, improving maintainability while preserving the same functionality. The template is instantiated with specific block types (e.g., block_q8_0) where needed. * Exclude BF16 support for COPY tensors for now ggml-ci * perf: adjust SYCL copy kernel block sizes for efficiency Use ceil_div to ensure full element coverage and update nd_range parameters to better align with SYCL block sizes, improving parallelism and device utilization in copy operations.	2025-06-07 18:58:20 +05:30
Romain Biessy	9012eb9b45	sycl: Add more debug prints (#13640 )	2025-05-26 10:28:53 +02:00
Akarshan Biswas	ece9745bb8	SYCL: Move CPY kernels to a separate file and add few missing kernels (#12133 ) * SYCL: refactor and move cpy kernels to a separate file * Add few missing cpy kernels * refactor and add debug logs	2025-03-03 11:07:22 +01:00

8 Commits