llama.cpp/ggml
Aman Gupta 55a1c5a5fd CUDA: add softmax broadcast (#14475)
* CUDA: add softmax broadcast

* Pass by const ref

* Review: Use blockDims for indexing, remove designated initializers

* Add TODO for noncontigous input/output
2025-07-02 15:48:33 +03:00
..
cmake ggml-cpu : rework weak alias on apple targets (#14146) 2025-06-16 13:54:15 +08:00
include ggml : support bcast ggml_soft_max_ext, ggml_flash_attn_ext (#14435) 2025-07-02 15:48:33 +03:00
src CUDA: add softmax broadcast (#14475) 2025-07-02 15:48:33 +03:00
.gitignore vulkan : cmake integration (#8119) 2024-07-13 18:12:39 +02:00
CMakeLists.txt ggml-cpu: enable IBM NNPA Vector Intrinsics (#14317) 2025-06-25 23:49:04 +02:00