Commit Graph

138 Commits

Author SHA1 Message Date
hongruichen 5ecbeb5842 Merge branch 'master' into dev-refactoring
# Conflicts:
#	CMakeLists.txt
#	ggml/src/CMakeLists.txt
#	ggml/src/ggml-backend.c
#	src/llama.cpp
2024-07-29 10:26:39 +08:00
hongruichen 1f9d2a7e22 refactoring: improve tensor print 2024-07-28 22:05:51 +08:00
Austin 4730faca61
chore : Fix vulkan related compiler warnings, add help text, improve CLI options (#8477)
* chore: Fix compiler warnings, add help text, improve CLI options

* Add prototypes for function definitions
* Invert logic of --no-clean option to be more intuitive
* Provide a new help prompt with clear instructions

* chore : Add ignore rule for vulkan shader generator

Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com>

* Update ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp

Co-authored-by: 0cc4m <picard12@live.de>

* chore : Remove void and apply C++ style empty parameters

* chore : Remove void and apply C++ style empty parameters

---------

Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com>
Co-authored-by: 0cc4m <picard12@live.de>
2024-07-28 09:52:42 +02:00
R0CKSTAR e54c35e4fb
feat: Support Moore Threads GPU (#8383)
* Update doc for MUSA

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* Add GGML_MUSA in Makefile

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* Add GGML_MUSA in CMake

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* CUDA => MUSA

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* MUSA adds support for __vsubss4

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* Fix CI build failure

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

---------

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2024-07-28 01:41:25 +02:00
Georgi Gerganov 345c8c0c87 ggml : add missing semicolon (#0)
ggml-ci
2024-07-27 17:43:44 +03:00
Mahesh Madhav a05ca93697 ggml : loop tiling optimizations for scalar path (ggml/898)
Apply a loop tiling technique to the generic path, which provides
performance upside for ISAs with enough registers to take advantage
of it. Also helps the compiler optimize this path.
2024-07-27 17:43:44 +03:00
Ivan Filipov 9f77d899b7 ggml: add support for float16 input tensors in pooling operations (ggml/895)
* Add support for float16 tensors in 1d pooling operations

* Add support for float16 input tensors in 2d pooling operations

* code cleanup

remove unnecessary casting during srow ptr initialization

---------

Co-authored-by: vanaka11 <vanaka1189@gmail.com>
2024-07-27 17:43:44 +03:00
Tony Wasserka 203b7f1531 vulkan : initialize vk_buffer_struct members to VK_NULL_HANDLE (ggml/893)
This prevents invalid frees when destroying a partially initialized
vk_buffer_struct. For example, this could happen in ggml_vk_create_buffer
when running out of device memory.

Co-authored-by: Tony Wasserka <neobrain@users.noreply.github.com>
2024-07-27 17:43:44 +03:00
Borislav Stanimirov d2b851bfa1 cmake : only enable GGML_NATIVE and x86 flags if not crosscompiling (ggml/885) 2024-07-27 17:43:44 +03:00
Daniel Bevenius c12b6e8ee7 ggml : remove unnecessary UNUSED macro call (ggml/880)
This commit removes an UNUSED macro call that is not needed as the
variable n0 is used in the code and will not produce a warning.

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-07-27 17:43:44 +03:00
wangshuai09 bfb4c74981
cann: Fix Multi-NPU execution error (#8710)
* cann: fix multi-npu exec error

* cann: update comment  for ggml_backend_cann_supports_buft
2024-07-27 16:36:44 +08:00
hongruichen e33b5c9837 refactoring: print the name of unsupport op 2024-07-27 13:49:49 +08:00
hongruichen 8ab1f15fe3 refactoring: remove internal functions, use op table directly 2024-07-27 13:43:07 +08:00
hongruichen e0c9b34016 feat: check if dims equal for add
looks qnn add can only applied to matrix with equal dimensions
2024-07-27 13:38:12 +08:00
hongruichen 5da73f8085 refactoring: move forward and supports_op into ops file 2024-07-27 13:24:57 +08:00
hongruichen 867c91bfaf feat: add error string for QnnOpPackage_Error_t 2024-07-27 13:24:57 +08:00
hongruichen ccfec70106 refactoring: remove unused get_rpcmem_from_memhandle func 2024-07-27 13:24:57 +08:00
hongruichen 2c73791d62 refactoring: remove dup code 2024-07-27 10:48:09 +08:00
slaren 2b1f616b20
ggml : reduce hash table reset cost (#8698)
* ggml : reduce hash table reset cost

* fix unreachable code warnings after GGML_ASSERT(false)

* GGML_ASSERT(false) -> GGML_ABORT("fatal error")

* GGML_ABORT use format string
2024-07-27 04:41:55 +02:00
hongruichen 18aa6654d5 refactoring: opt graph key gen 2024-07-27 10:39:07 +08:00
hongruichen be9a8c73a0 fix: suppress warning 2024-07-26 23:07:25 +08:00
hongruichen 47735cb589 fix: try fix error in 2nd run by appending dimension into graph key 2024-07-26 23:04:53 +08:00
hongruichen ee305cc171 refactoring: split qnn rpc buffer into dedicated class 2024-07-26 22:52:23 +08:00
DavidKorczynski 49ce0ab6d4
ggml: handle ggml_init failure to fix NULL pointer deref (#8692)
`ggml_init` can fail if no unused context is found. In that case, a NULL-pointer deref will happen later in the code during a call to `ggml_set_on_alloc`.

This fixes it by bailing out if no context is found.
2024-07-25 23:23:05 +02:00
Andreas (Andi) Kunar bf5a81df37
ggml : fix build on Windows with Snapdragon X (#8531)
* Improvements for Windows with Snapdragon X

* Revert "Improvements for Windows with Snapdragon X"

This reverts commit bf21397ae5.

* Improvements for Windows with Snapdragon X

* WOA build clarifications

* WIndows on ARM build clarifications

* cmake build for Windows clarifications

* Update docs/build.md

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: AndreasKunar <andreaskmsn.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-07-25 19:01:00 +03:00
hongruichen f843e5aaf5 fix: 1.free up rpc memory at destruct
2. unbind tesnsor
2024-07-25 23:45:04 +08:00
Chen Xi ed67bcb24f
[SYCL] fix multi-gpu issue on sycl (#8554)
---------

Signed-off-by: Chen Xi <xi2chen@intel.com>
Co-authored-by: Meng, Hengyu <hengyu.meng@intel.com>
2024-07-25 19:45:18 +08:00
Georgi Gerganov eddcb5238b
ggml : add and use ggml_cpu_has_llamafile() (#8664) 2024-07-25 12:37:42 +03:00
Joe Todd 79167d9e49
Re-add erroneously removed -fsycl from GGML_EXTRA_LIBS (#8667) 2024-07-24 11:55:26 +01:00
Joe Todd 64cf50a0ed
sycl : Add support for non-release DPC++ & oneMKL (#8644)
* Update cmake to support nvidia hardware & open-source compiler
---------
Signed-off-by: Joe Todd <joe.todd@codeplay.com>
2024-07-23 14:58:37 +01:00
0cc4m 751fcfc6c3
Vulkan IQ4_NL Support (#8613)
* Fix Vulkan matmul tests compile errors

* Add Vulkan IQ4_NL support

* Fix Vulkan DeepSeek-Coder-V2-Lite MoE support
2024-07-23 10:56:49 +02:00
Jeroen Mostert 46e47417aa
Allow all RDNA2 archs to use sdot4 intrinsic (#8629)
The check gating the use of `__builtin_amdgc_sdot4` specifically checks for gfx1030. This causes a severe perf regression for anything gfx103? that's not gfx1030 and not using `HSA_OVERRIDE_GFX_VERSION` (if you've built ROCm to support it). We already have a generic RDNA2 define, let's use it.
2024-07-23 10:50:40 +02:00
luoyu-intel 063d99ad11
[SYCL] fix scratch size of softmax (#8642) 2024-07-23 15:43:28 +08:00
hongruichen 706793f078 fix: back to qnn tensor v1 to fix the create tensor error 2024-07-22 23:08:38 +08:00
hongruichen 3b47056c97 refactoring: change the tensor binding mode between qnn tensor and ggml tensor 2024-07-22 23:08:38 +08:00
Mark Zhuang 04bab6b7da
ggml: fix compile error for RISC-V (#8623) 2024-07-22 10:56:45 +03:00
Johannes Gäßler 69c487f4ed
CUDA: MMQ code deduplication + iquant support (#8495)
* CUDA: MMQ code deduplication + iquant support

* 1 less parallel job for CI build
2024-07-20 22:25:26 +02:00
Georgi Gerganov 07283b1a90
gguf : handle null name during init (#8587) 2024-07-20 17:15:42 +03:00
hongruichen b173c4e061 feat: update tensor name when bind to graph 2024-07-20 17:31:40 +08:00
hongruichen 5f3b1ae3b0 fix: try fix graph cache with append the tensors name 2024-07-20 16:39:06 +08:00
hongruichen 51f95d6980 fix: dimension could be wrong for tensor liked 1x1x8 2024-07-20 16:11:35 +08:00
hongruichen 27299463ae fix: try fix tensor type error 2024-07-20 15:13:10 +08:00
hongruichen 28a00e5e6c fix: try fix QNN_GRAPH_ERROR_INVALID_OP_CONFIG 2024-07-20 14:11:58 +08:00
hongruichen 1679dcf47e fix: check all dimentions in `can offload` 2024-07-20 13:29:01 +08:00
slaren 87e397d00b
ggml : fix quant dot product with odd number of blocks (#8549)
* ggml : fix iq4_nl dot product with odd number of blocks

* ggml : fix odd blocks for ARM_NEON (#8556)

* ggml : fix iq4_nl dot product with odd number of blocks

* ggml : fix q4_1

* ggml : fix q5_0

* ggml : fix q5_1

* ggml : fix iq4_nl metal

ggml-ci

* ggml : fix q4_0

* ggml : fix q8_0

ggml-ci

* ggml : remove special Q4_0 code for first 2 blocks

* ggml : fix sumf redefinition

---------

Co-authored-by: slaren <slarengh@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-07-19 17:17:27 +02:00
hongruichen b1b5cc10b1 add function to convert qnn error into string 2024-07-19 22:51:17 +08:00
Clint Herron b57eb9ca4f
ggml : add friendlier error message to fopen errors (#8575)
* Add additional error information when model files fail to load.

* Adding additional error information to most instances of fopen.
2024-07-19 14:05:45 +03:00
hongruichen a607995f95 Reapply "tried fix the add node error 6005"
This reverts commit f45fbec8f4.
2024-07-19 15:35:55 +08:00
hongruichen 0153a23d3f fix support ops
This reverts commit f45fbec8f4.
2024-07-19 15:31:29 +08:00
hongruichen f45fbec8f4 Revert "tried fix the add node error 6005"
This reverts commit ce3d09e5f2.
2024-07-19 12:59:38 +08:00