Commit Graph

85 Commits

Author SHA1 Message Date
hongruichen ce3d09e5f2 tried fix the add node error 6005 2024-07-19 12:59:21 +08:00
hongruichen 665f823748 fix op checker 2024-07-18 22:26:53 +08:00
hongruichen 15f5cc450c bug: fix allocation size overflow at log 2024-07-18 19:44:05 +08:00
hongruichen d82b3a0bdb feat: add GGML_UNARY_OP_GELU 2024-07-18 11:15:48 +08:00
hongruichen ce199b2de7 refactoring: downgrade some log to debug level 2024-07-17 23:49:47 +08:00
hongruichen c76fc9aa2f fix warnings 2024-07-17 23:32:13 +08:00
hongruichen 6457a68bd7 disable qnn profiling in release build 2024-07-17 23:24:29 +08:00
hongruichen 2502b57203 fix warnings 2024-07-17 22:10:12 +08:00
hongruichen 454deef83c register qnn backend 2024-07-17 21:25:55 +08:00
hongruichen eed960575f add build step of QNN backend at ggml 2024-07-17 19:43:01 +08:00
hongruichen 861bb9c580 Merge tag 'b3405' into dev-refactoring 2024-07-17 17:13:55 +08:00
hongruichen bb13795dce refactoring: remove unused functions and variables 2024-07-17 14:17:35 +08:00
hongruichen 63dc587dff refactoring: make the buffer alloc and free stay in same class 2024-07-17 14:08:31 +08:00
hongruichen b1ef302991 refactoring: remove depend of dlsym at utils.hpp 2024-07-17 12:21:33 +08:00
Johannes Gäßler 5e116e8dd5
make/cmake: add missing force MMQ/cuBLAS for HIP (#8515) 2024-07-16 21:20:59 +02:00
hongruichen 0301b500cd refactoring: prevent leak the QNN_INTERFACE_VER_TYPE and QNN_SYSTEM_INTERFACE_VER_TYPE outside of qnn.hpp 2024-07-17 00:18:38 +08:00
Xuan Son Nguyen 97bdd26eee
Refactor lora adapter support (#8332)
* lora: load to devide buft

* add patch tensor function

* correct tensor patch

* llama_lora_adapter_apply

* correct ggml_backend_tensor_copy

* add llm_build_mm

* fix auto merge

* update based on review comments

* add convert script

* no more transpose A

* add f16 convert

* add metadata check

* add sanity check

* fix ftype

* add requirements

* fix requirements

* fix outfile

* conversion: only allow selected models

* fix types

* cuda : do not use dmmv if the tensor does not have enough cols

* llama : lora fixes

* do not disable mmap with lora

Co-authored-by: slaren <slarengh@gmail.com>

* llm_build_lora_mm_id

* convert_lora : MoE LoRA conversion support

* convert_lora : prefer safetensors, similarly to convert_hf

* convert_hf : simplify modify_tensors for InternLM2

* convert_lora : lazy conversion

* llama : load and use alpha from LoRA adapters

* llama : use llm_build_lora_mm in most model graphs

* auto scale

* Revert "auto scale"

This reverts commit 42415a4874.

* remove redundant params

* Apply suggestions from code review

Co-authored-by: slaren <slarengh@gmail.com>

* change kv metadata

* move add_type to __init__

* convert_hf : move add_type to main()

* convert_lora : use the GGUFWriter from Model instead of overwriting it

---------

Co-authored-by: slaren <slarengh@gmail.com>
Co-authored-by: Francis Couture-Harpin <git@compilade.net>
2024-07-15 20:50:47 +02:00
Daniel Bevenius 8fac431b06
ggml : suppress unknown pragma 'GCC' on windows (#8460)
This commit adds a macro guard to pragma GCC to avoid the following
warning on windows:

```console
C:\llama.cpp\ggml\src\ggml-aarch64.c(17,9): warning C4068:
unknown pragma 'GCC' [C:\lama.cpp\build\ggml\src\ggml.vcxproj]
```
2024-07-15 15:48:17 +03:00
Meng, Hengyu 16bdfa42ac
[SYCL] add concat through dim 1/2 (#8483)
* add concat through dim 1/2
2024-07-15 19:32:15 +08:00
0cc4m bda62d7999
Vulkan MMQ Fix (#8479)
* Fix incoherence by adding missing LOAD_VEC_A parameter

* Fix Vulkan op result checker build error
2024-07-15 09:38:52 +02:00
hongruichen f32327e2b2 remove multiply declearation of log in unit test 2024-07-15 12:06:12 +08:00
hongruichen 30b40006cc remove unused declarations 2024-07-14 23:50:11 +08:00
hongruichen 148ceab70c add log op 2024-07-14 23:00:50 +08:00
bandoti 17eb6aa8a9
vulkan : cmake integration (#8119)
* Add Vulkan to CMake pkg

* Add Sycl to CMake pkg

* Add OpenMP to CMake pkg

* Split generated shader file into separate translation unit

* Add CMake target for Vulkan shaders

* Update README.md

* Add make target for Vulkan shaders

* Use pkg-config to locate vulkan library

* Add vulkan SDK dep to ubuntu-22-cmake-vulkan workflow

* Clean up tabs

* Move sudo to apt-key invocation

* Forward GGML_EXTRA_LIBS to CMake config pkg

* Update vulkan obj file paths

* Add shaderc to nix pkg

* Add python3 to Vulkan nix build

* Link against ggml in cmake pkg

* Remove Python dependency from Vulkan build

* code review changes

* Remove trailing newline

* Add cflags from pkg-config to fix w64devkit build

* Update README.md

* Remove trailing whitespace

* Update README.md

* Remove trailing whitespace

* Fix doc heading

* Make glslc required Vulkan component

* remove clblast from nix pkg
2024-07-13 18:12:39 +02:00
Georgi Gerganov c917b67f06
metal : template-ify some of the kernels (#8447)
ggml-ci
2024-07-13 18:32:33 +03:00
hongruichen 100ccd5e7f add unary op template and more ops 2024-07-13 00:55:34 +08:00
hongruichen e3aa43adbd suppress warning 2024-07-12 23:26:11 +08:00
hongruichen f0894d897a wip
wip
2024-07-12 19:57:34 +08:00
Georgi Gerganov 370b1f7e7a
ggml : minor naming changes (#8433)
* ggml : minor naming changes

ggml-ci

* ggml : use PRId64 [no ci]

* ggml : revert FA K/Q names
2024-07-12 10:46:02 +03:00
Chen Xi b549a1bbef
[SYCL] fix the mul_mat_id ut issues (#8427)
* fix part of mul_mat_id

* skip the bfloat 16 sycl ut

Signed-off-by: Chen Xi <xi2chen@intel.com>

---------

Signed-off-by: Chen Xi <xi2chen@intel.com>
Co-authored-by: Meng, Hengyu <hengyu.meng@intel.com>
Co-authored-by: Chen Xi <xi2chen@intel.com>
2024-07-12 08:52:04 +08:00
Nicholai Tukanov 368645698a
ggml : add NVPL BLAS support (#8329) (#8425)
* ggml : add NVPL BLAS support

* ggml : replace `<BLASLIB>_ENABLE_CBLAS` with `GGML_BLAS_USE_<BLASLIB>`

---------

Co-authored-by: ntukanov <ntukanov@nvidia.com>
2024-07-11 18:49:15 +02:00
Daniel Bevenius b078c619aa
cuda : suppress 'noreturn' warn in no_device_code (#8414)
* cuda : suppress 'noreturn' warn in no_device_code

This commit adds a while(true) loop to the no_device_code function in
common.cuh. This is done to suppress the warning:

```console
/ggml/src/ggml-cuda/template-instances/../common.cuh:346:1: warning:
function declared 'noreturn' should not return [-Winvalid-noreturn]
  346 | }
      | ^
```

The motivation for this is to reduce the number of warnings when
compilng with GGML_HIPBLAS=ON.

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

* squash! cuda : suppress 'noreturn' warn in no_device_code

Update __trap macro instead of using a while loop to suppress the
warning.

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

---------

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-07-11 17:53:42 +02:00
Johannes Gäßler 808aba3916
CUDA: optimize and refactor MMQ (#8416)
* CUDA: optimize and refactor MMQ

* explicit q8_1 memory layouts, add documentation
2024-07-11 16:47:47 +02:00
hongruichen be3aa9631f use template function directly 2024-07-11 11:18:06 +08:00
hongruichen 8932135fdb add sqrt and mul ops 2024-07-11 00:08:08 +08:00
hongruichen 7ea28a6fac add helper function for binary op 2024-07-10 23:39:03 +08:00
AidanBeltonS f4444d992c
[SYCL] Use multi_ptr to clean up deprecated warnings (#8256) 2024-07-10 16:10:49 +01:00
hongruichen b6f29273f0 add function to get graph from cache 2024-07-10 23:08:32 +08:00
Georgi Gerganov 6b2a849d1f
ggml : move sgemm sources to llamafile subfolder (#8394)
ggml-ci
2024-07-10 15:23:29 +03:00
Dibakar Gope 0f1a39f343
ggml : add AArch64 optimized GEMV and GEMM Q4 kernels (#5780)
* Arm AArch64: optimized GEMV and GEMM kernels for q4_0_q8_0, and q8_0_q8_0 quantization

* Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions

* Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions

* Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions

* Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions

* Arm AArch64: add copyright claim only to ggml-aarch64.cpp and ggml-aarch64.h files

* Arm AArch64: minor code refactoring for rebase

* Arm AArch64: minor code refactoring for resolving a build issue with cmake

* Arm AArch64: minor code refactoring to split the Q4_0_AARC64 type into three separate types: Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8

* Arm AArch64: minor code change for resolving a build issue with server-windows

* retrigger checks

* Arm AArch64: minor code changes for rebase

* Arm AArch64: minor changes to skip the pr#7433 vec_dot code for arm cpus with SVE VL not equal to 256 bits

* Arm AArch64: remove stale LLAMA_QKK_64 from CMakeLists.txt and delete build.zig

* Arm AArch64: add reference scalar gemm and gemv, and avoid dynamic memory allocations during quantization for Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8

* Arm AArch64: add multithreaded quantization support for the new types: Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8

* Arm AArch64: minor code refactoring

* Arm AArch64: simplify logic for calling gemm and gemv functions in ggml_compute_forward_mul_mat

* Arm AArch64: minimize changes in ggml_compute_forward_mul_mat

* Arm AArch64: minor code refactoring, and add reference scalar code to quantize routines for new quant types

* Arm AArch64: minor code refactoring

* Arm AArch64: minor code refactoring

* Arm AArch64: minor code refactoring

* rebase on the latest master commit 3fd62a6 and adapt to the new directory structure

* Arm AArch64: remove a redundant comment

* Arm AArch64: add pragma in ggml-aarch64.c to turn -Woverlength-strings warning off

* Arm AArch64: use __aarch64__ check to guard 64-bit neon kernels

* Arm AArch64: update docs/build.md README to include compile time flags for buiilding the Q4_0_4_4 quant type
2024-07-10 15:14:51 +03:00
hongruichen 80051cfc4d remove unused variables 2024-07-10 19:57:47 +08:00
hongruichen b49b501e26 fix sprintf type 2024-07-10 19:48:57 +08:00
hongruichen 3feb574bf0 merge register_rpc_mem into alloc_rpc_mem 2024-07-10 19:40:02 +08:00
hongruichen e97d3a6c48 fix tensor buffer allocation
add log

commit qnn buffer after changed

add log

register_rpc_mem 2 times

update input tensors before graph finalize

default to QNN_TENSORMEMTYPE_RAW

set new tensors at execute

move write input tensors to exec

check if mem registered before actual do

register rpc mem once allocated
2024-07-10 19:32:39 +08:00
hongruichen dc7d83e121 add log 2024-07-10 00:33:23 +08:00
hongruichen 9add256efe use helper function instead 2024-07-10 00:31:39 +08:00
hongruichen a7be0693ba add log 2024-07-10 00:29:43 +08:00
hongruichen af869fd636 fix compiling error in debug build 2024-07-10 00:23:51 +08:00
Alberto Cabrera Pérez 5b0b8d8cfb
sycl : Reenabled mmvq path for the SYCL Nvidia Backend (#8372)
* SYCL : Reenabled mmvq path for the SYCL Nvidia Backend

* Reduced verbosity of comment
2024-07-09 22:03:15 +08:00
Hongrui Chen 5f2e3918f6 refactoring ggml_qnn_tensor 2024-07-09 19:58:46 +08:00