Commit Graph

4 Commits

Author SHA1 Message Date
Aman Gupta 8bece2eb20
CUDA: use mmvq for mul-mat-id for small batch sizes (#18958)
* CUDA: use mmvq for mul-mat-id for small batch sizes

* add mmvq too

* Fix perf issue on ampere. Use mmvf mm-id only for non-nvidia GPUs

* templatize multi_token_path
2026-02-03 23:31:23 +08:00
Johannes Gäßler aa374175c3
CUDA: fix crash on uneven context without FA (#16988) 2025-11-06 14:05:47 +01:00
Aman Gupta f77c13b91f
CUDA: General GEMV fusion (#16715) 2025-10-26 19:28:04 +08:00
Johannes Gäßler 1d72c84188
CUDA: GEMM for FP32/FP16/BF16 and ne11 <= 16 (#15131)
* CUDA: GEMM for FP32/FP16/BF16 and ne11 <= 16
2025-08-07 10:53:21 +02:00
Renamed from ggml/src/ggml-cuda/mmv.cuh (Browse further)