Johannes Gäßler
b284197df4
CUDA: fix compilation with GGML_CUDA_F16 ( #14837 )
2025-07-23 18:22:30 +02:00
Johannes Gäßler
07a19e27a2
CUDA: fix quantized KV cache + multiple sequences ( #14822 )
...
* CUDA: fix quantized KV cache + multiple sequences
* Update ggml/src/ggml-cuda/fattn-common.cuh
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-07-23 14:08:09 +03:00
Aman Gupta
27208bf657
CUDA: add bf16 and f32 support to cublas_mul_mat_batched ( #14361 )
...
* CUDA: add bf16 and f32 support to cublas_mul_mat_batched
* Review: add type traits and make function more generic
* Review: make check more explicit, add back comments, and fix formatting
* Review: fix formatting, remove useless type conversion, fix naming for bools
2025-06-29 01:30:53 +08:00
Johannes Gäßler
cdf76586b2
CUDA: fix non-cont. inputs for batched mat mul ( #13155 )
2025-04-29 16:00:27 +02:00
Georgi Gerganov
1a1ab7e7a4
cuda : fix HIP and MUSA BF16 ( #0 )
...
ggml-ci
2025-04-07 18:44:17 +03:00
Sigbjørn Skjæret
36ca8b3628
CUDA: don't convert BF16 weights to FP32 (ggml/1174)
...
* add bf16 support
* use convert_from_bf16_cuda instead of convert_unary_cuda for f32
* revert 7ec5085
* move functionality into convert_unary with constexpr
2025-04-07 18:44:17 +03:00
R0CKSTAR
492d7f1ff7
musa: fix all warnings, re-enable `-DLLAMA_FATAL_WARNINGS=ON` in ci and update doc ( #12611 )
...
* musa: fix all warnings
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* musa: enable -DLLAMA_FATAL_WARNINGS=ON in run.sh
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* musa: update ci doc (install ccache)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* fix Windows build issue
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* Address review comments
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* Address review comments
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-03-30 10:59:38 +02:00
Johannes Gäßler
b9ab0a4d0b
CUDA: use arch list for compatibility check ( #11775 )
...
* CUDA: use arch list for feature availability check
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-02-11 00:17:22 +01:00
Johannes Gäßler
46e3556e01
CUDA: add BF16 support ( #11093 )
...
* CUDA: add BF16 support
2025-01-06 02:33:52 +01:00
Andreas Kieslinger
750cb3e246
CUDA: rename macros to avoid conflicts with WinAPI ( #10736 )
...
* Renames NVIDIA GPU-architecture flags to avoid name clashes with WinAPI. (e.g. CC_PASCAL, GPU architecture or WinAPI pascal compiler flag?)
* Reverts erroneous rename in SYCL-code.
* Renames GGML_CUDA_MIN_CC_DP4A to GGML_CUDA_CC_DP4A.
* Renames the rest of the compute capability macros for consistency.
2024-12-10 18:23:24 +01:00
Georgi Gerganov
f3f65429c4
llama : reorganize source code + improve CMake ( #8006 )
...
* scripts : update sync [no ci]
* files : relocate [no ci]
* ci : disable kompute build [no ci]
* cmake : fixes [no ci]
* server : fix mingw build
ggml-ci
* cmake : minor [no ci]
* cmake : link math library [no ci]
* cmake : build normal ggml library (not object library) [no ci]
* cmake : fix kompute build
ggml-ci
* make,cmake : fix LLAMA_CUDA + replace GGML_CDEF_PRIVATE
ggml-ci
* move public backend headers to the public include directory (#8122 )
* move public backend headers to the public include directory
* nix test
* spm : fix metal header
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* scripts : fix sync paths [no ci]
* scripts : sync ggml-blas.h [no ci]
---------
Co-authored-by: slaren <slarengh@gmail.com>
2024-06-26 18:33:02 +03:00