llama.cpp

Commit Graph

Author	SHA1	Message	Date
Johannes Gäßler	2e1c9cd814	CUDA: generalized (mma) FA, add Volta support (#17505 ) * CUDA: generalized (mma) FA, add Volta support * use struct for MMA FA kernel config --------- Co-authored-by: Aman Gupta <aman>	2025-12-03 16:57:05 +01:00
yulo	028f93ef98	HIP: RDNA4 tensor core support for MMF (#17077 ) * mmf for rdna4 * align the padding for rdna4 * forbit mul_mat_f for rdna4 * fix as comment * remove device kernels * add constexpr for early return * update based on review comment * change based on the review comment * pass compile error * keep code consistency --------- Co-authored-by: zhang hui <you@example.com>	2025-11-22 00:03:24 +01:00
Johannes Gäßler	aa374175c3	CUDA: fix crash on uneven context without FA (#16988 )	2025-11-06 14:05:47 +01:00
Johannes Gäßler	31c511a968	CUDA: Volta tensor core support for MMF (#16843 ) * CUDA: Volta tensor core support for MMF * more generic checks for hardware support * Update ggml/src/ggml-cuda/mmf.cuh Co-authored-by: Aman Gupta <amangupta052@gmail.com> --------- Co-authored-by: Aman Gupta <amangupta052@gmail.com>	2025-10-31 15:57:19 +01:00
Aman Gupta	48e2fa9fb7	CUDA: add fp kernel for larger batch size MoE (#16512 ) * CUDA: kernel for larger batch sizes for MoE * WIP * WIP * WIP * WIP * WIP * WIP * fixup * tests * Move mmq_ids_helper to mmid * cleanup * Remove redundant checks	2025-10-14 13:15:15 +02:00
Aman Gupta	c0bfc57af4	CUDA: mul_mat_id for mmf for bs <= 64 for f16 and bs <= 32 for f32 (#16277 ) * CUDA: mul_mat_id for mmf for bs <= 64 for f16 and bs <= 32 for f32 This commit adds mul_mat_id support for ncols_dst >= 16. It does this by packing ncols_dst tiles into the blockDim.y. My tests on a RTX 3090 show that this is faster than the cuBLAS fallback for f16 till bs=64, and for f32 till bs=32 * Review: refactor if statement	2025-09-27 18:49:32 +02:00
Aman Gupta	106220562a	CUDA: some micro-optimizations in mmf.cuh for mul_mat_id (#15926 )	2025-09-15 17:35:11 +08:00
Aman Gupta	a972faebed	CUDA: Add mul_mat_id support for the mmf kernel (#15767 ) * CUDA: Add mul_mat_id support the mmf Add support for mul_mat_id for bs < 16 * Review: use warp_size, fix should_use_mmf condition * Launch one block per expert, stride along n_expert_used * templatize mul_mat_id * Pad shmem to 16 bytes, add helper function mul_mat_f_switch_ids * Reduce compile times by dividing mmf into f16, bf16 and f32 variants * Divide mmf by ncols_dst * Add missing files * Fix MUSA/HIP builds	2025-09-09 14:38:02 +08:00
Johannes Gäßler	1d72c84188	CUDA: GEMM for FP32/FP16/BF16 and ne11 <= 16 (#15131 ) * CUDA: GEMM for FP32/FP16/BF16 and ne11 <= 16	2025-08-07 10:53:21 +02:00

9 Commits