* chore: Fix compiler warnings, add help text, improve CLI options
* Add prototypes for function definitions
* Invert logic of --no-clean option to be more intuitive
* Provide a new help prompt with clear instructions
* chore : Add ignore rule for vulkan shader generator
Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com>
* Update ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp
Co-authored-by: 0cc4m <picard12@live.de>
* chore : Remove void and apply C++ style empty parameters
* chore : Remove void and apply C++ style empty parameters
---------
Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com>
Co-authored-by: 0cc4m <picard12@live.de>
* Update doc for MUSA
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* Add GGML_MUSA in Makefile
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* Add GGML_MUSA in CMake
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* CUDA => MUSA
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* MUSA adds support for __vsubss4
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* Fix CI build failure
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Apply a loop tiling technique to the generic path, which provides
performance upside for ISAs with enough registers to take advantage
of it. Also helps the compiler optimize this path.
* Add support for float16 tensors in 1d pooling operations
* Add support for float16 input tensors in 2d pooling operations
* code cleanup
remove unnecessary casting during srow ptr initialization
---------
Co-authored-by: vanaka11 <vanaka1189@gmail.com>
This prevents invalid frees when destroying a partially initialized
vk_buffer_struct. For example, this could happen in ggml_vk_create_buffer
when running out of device memory.
Co-authored-by: Tony Wasserka <neobrain@users.noreply.github.com>
This commit removes an UNUSED macro call that is not needed as the
variable n0 is used in the code and will not produce a warning.
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
`ggml_init` can fail if no unused context is found. In that case, a NULL-pointer deref will happen later in the code during a call to `ggml_set_on_alloc`.
This fixes it by bailing out if no context is found.
* Improvements for Windows with Snapdragon X
* Revert "Improvements for Windows with Snapdragon X"
This reverts commit bf21397ae5.
* Improvements for Windows with Snapdragon X
* WOA build clarifications
* WIndows on ARM build clarifications
* cmake build for Windows clarifications
* Update docs/build.md
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: AndreasKunar <andreaskmsn.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
The check gating the use of `__builtin_amdgc_sdot4` specifically checks for gfx1030. This causes a severe perf regression for anything gfx103? that's not gfx1030 and not using `HSA_OVERRIDE_GFX_VERSION` (if you've built ROCm to support it). We already have a generic RDNA2 define, let's use it.