* Add include files for std::min/max and std::toupper/tolower
* win32: move _USE_MATH_DEFINES before includes to ensure M_PI is defined
* Use GGML_RESTRICT instead of "restrict" keyword everywhere, and use "__restrict" in MSVC plain C mode
* win32: only use __restrict in MSVC if C11/C17 support is not enabled
---------
Co-authored-by: Marcus Groeber <Marcus.Groeber@cerence.com>
Adds GGML_HIP_ROCWMMA_FATTN and rocwmma header check
Adds rocWMMA support to fattn-wmma-f16
---
Signed-off-by: Carl Klemm <carl@uvos.xyz>
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
Co-authored-by: Ben Jackson <ben@ben.com>
* Support fp16 unary operations in the CUDA backend
* cpu: increase fp16 support for unary operators in the CPU backend
* cuda: increase fp16 support for unary operators in the CUDA backend
* Add test cases for fp16 unary operators
* metal: update supports_op for unary operators that don't support fp16, to prevent test-backend-ops from failing
* metal: fix PR comments for unary op support after fp16 unary tests
* Support float16-to-float16 add/sub/mul/div operations in the CUDA backend
* Add fp16 support for add/sub/mul/div on the CPU backend
* Add test cases for fp16 add/sub/mul/div
* Upgrade init_tensor API to return a ggml_status
To prepare for an 'abort-free' ggml
(ggml not to abort on OOMs but return a OOM status),
as agreeed with Diego in the ggml repo,
upgrade the init_tensor() and view_init() APIs
to return a ggml_status.
* misc fixes
---------
Co-authored-by: slaren <slarengh@gmail.com>
* debug
* disable reshape
* make sure single node op have same type
* fix warning at the logger
* Revert "disable reshape"
This reverts commit 5aeca4ba9bec6db3f047f9da803df20f9f6612b3.
* vulkan: implement specialized MMV kernels for IQ2 quantizations
* vulkan: add MMV kernels for IQ3 quants
* vulkan: Increase MMV batch size and unroll IQ LUT setup
* vulkan: fix init_iq_shmem for WG sizes larger than tables
* vulkan: common batch size for all I-quants
* Added SVE Support for Q2_K Quantized Models
* Use 4-space indentation in the switch cases
* removed comments lines
* Remove the loop Retain the curly bracess for better understanding of code
* Remove the comment like added for q3_k_q8_k kernel
---------
Co-authored-by: vithulep <p.m.vithule1517@gmail.com>
* fix warning
* wip
* add todo for graph key generate
* rename some file to meet upstream guideline
* remove local .clang-format
* expend supported/unsupported counter to all ops
* append device name to log
* port to ggml logger
* fix warning after adapt to ggml logger
* append \n to all log
* use case op instead of convert
* Revert "use case op instead of convert"
This reverts commit e662fc2dfee41719aaf7bc9d75e03e8d0f7ded0f.
* fix op that needs same shape
* opt kQnnOpsTable
* refresh params name field when getting op config
* opt npu log print
* remove unused functions
* Fix dependencies between ggml and backends
ggml backends link only to ggml-base and ggml links to all backends.
* Fix installation of ggml backends
Set up GNUInstallDirs before setting the installation directory of ggml backends
* opt performance by reorder for Intel GPU
* detect hw type and save opt feature, and print opt feature
* correct name
* support optimize graph once when compute graph, record the opt status in tensor->extra, make CI passed
* add env variable GGML_SYCL_DISABLE_OPT for debug
* use syclex::architecture replace the custom hw define, update the guide for GGML_SYCL_DISABLE_OPT
* add performance data
* mv getrows functions to separeted files
* fix global variables
---------
Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>
* move qnn_instance function implementation into cpp
* wip
* wip
* move dl related function into separated file
* use cast op for gpu
* Revert "use cast op for gpu"
This reverts commit 05df7362a15c022d05940d682e84cf480a082c6a.
* Reapply "use cast op for gpu"
This reverts commit 2520e5922a216faceb6d7efcde23dafe6947a4b3.
* fix compiling error in win
* fix align_alloc in win
* fix compiling error
* add get sys free/total mem for win
* wip
* suppress warning in win
* add missing chrono header
* set the correct qnn lib name for windows
* add flag to control cpu backend
* wip
* wip
* Revert "Reapply "use cast op for gpu""
This reverts commit f56519c374a7d46faac706cf214de48ff5fc5139.
* fix compiling error for linux build
* fix cdsprpc dynamic library name
* wip
* skip rpc load fail
* fix page_align_alloc
* suppress some warning in gcc
* wip
* reuse align to function
* more log
* add log and fix warning
* wip
* fix asan errors and memory leaks
* fix the get_io_tensors_from_graph
* improve comment
* print GGML_QNN_DEFAULT_LIB_SEARCH_PATH
* revert some unused changes
* move library search path setter into qnn module
* fix android library loading
* skip qnn_device_get_platform_info for npu emulator
* MUSA: support ARM64 and enable __dp4a .etc
* fix cross entropy loss op for musa
* update
* add cc info log for musa
* add comment for the MUSA .cc calculation block
---------
Co-authored-by: Bodhi Hu <huaishun.hu@mthreads.com>
* ggml-cpu: Add CPU backend support for KleidiAI library
* Add environmental variable GGML_KLEIDIAI_SME
* Add support for multithread LHS conversion
* Switch kernel selection order to dotprod and i8mm
* updates for review comments
* More updates for review comments
* Reorganize and rename KleidiAI files
* Move ggml-cpu-traits.h to source file
* Update cmake for SME build and add alignment for SME
* Remove append GGML_USE_CPU_KLEIDIAI to the GGML_CDEF_PUBLIC list