Commit Graph

110 Commits

Author SHA1 Message Date
hongruichen 1f9d2a7e22 refactoring: improve tensor print 2024-07-28 22:05:51 +08:00
hongruichen e33b5c9837 refactoring: print the name of unsupport op 2024-07-27 13:49:49 +08:00
hongruichen 8ab1f15fe3 refactoring: remove internal functions, use op table directly 2024-07-27 13:43:07 +08:00
hongruichen e0c9b34016 feat: check if dims equal for add
looks qnn add can only applied to matrix with equal dimensions
2024-07-27 13:38:12 +08:00
hongruichen 5da73f8085 refactoring: move forward and supports_op into ops file 2024-07-27 13:24:57 +08:00
hongruichen 867c91bfaf feat: add error string for QnnOpPackage_Error_t 2024-07-27 13:24:57 +08:00
hongruichen ccfec70106 refactoring: remove unused get_rpcmem_from_memhandle func 2024-07-27 13:24:57 +08:00
hongruichen 2c73791d62 refactoring: remove dup code 2024-07-27 10:48:09 +08:00
hongruichen 18aa6654d5 refactoring: opt graph key gen 2024-07-27 10:39:07 +08:00
hongruichen be9a8c73a0 fix: suppress warning 2024-07-26 23:07:25 +08:00
hongruichen 47735cb589 fix: try fix error in 2nd run by appending dimension into graph key 2024-07-26 23:04:53 +08:00
hongruichen ee305cc171 refactoring: split qnn rpc buffer into dedicated class 2024-07-26 22:52:23 +08:00
hongruichen f843e5aaf5 fix: 1.free up rpc memory at destruct
2. unbind tesnsor
2024-07-25 23:45:04 +08:00
hongruichen 706793f078 fix: back to qnn tensor v1 to fix the create tensor error 2024-07-22 23:08:38 +08:00
hongruichen 3b47056c97 refactoring: change the tensor binding mode between qnn tensor and ggml tensor 2024-07-22 23:08:38 +08:00
hongruichen b173c4e061 feat: update tensor name when bind to graph 2024-07-20 17:31:40 +08:00
hongruichen 5f3b1ae3b0 fix: try fix graph cache with append the tensors name 2024-07-20 16:39:06 +08:00
hongruichen 51f95d6980 fix: dimension could be wrong for tensor liked 1x1x8 2024-07-20 16:11:35 +08:00
hongruichen 27299463ae fix: try fix tensor type error 2024-07-20 15:13:10 +08:00
hongruichen 28a00e5e6c fix: try fix QNN_GRAPH_ERROR_INVALID_OP_CONFIG 2024-07-20 14:11:58 +08:00
hongruichen 1679dcf47e fix: check all dimentions in `can offload` 2024-07-20 13:29:01 +08:00
hongruichen b1b5cc10b1 add function to convert qnn error into string 2024-07-19 22:51:17 +08:00
hongruichen a607995f95 Reapply "tried fix the add node error 6005"
This reverts commit f45fbec8f4.
2024-07-19 15:35:55 +08:00
hongruichen 0153a23d3f fix support ops
This reverts commit f45fbec8f4.
2024-07-19 15:31:29 +08:00
hongruichen f45fbec8f4 Revert "tried fix the add node error 6005"
This reverts commit ce3d09e5f2.
2024-07-19 12:59:38 +08:00
hongruichen ce3d09e5f2 tried fix the add node error 6005 2024-07-19 12:59:21 +08:00
hongruichen 665f823748 fix op checker 2024-07-18 22:26:53 +08:00
hongruichen 15f5cc450c bug: fix allocation size overflow at log 2024-07-18 19:44:05 +08:00
hongruichen d82b3a0bdb feat: add GGML_UNARY_OP_GELU 2024-07-18 11:15:48 +08:00
hongruichen ce199b2de7 refactoring: downgrade some log to debug level 2024-07-17 23:49:47 +08:00
hongruichen c76fc9aa2f fix warnings 2024-07-17 23:32:13 +08:00
hongruichen 6457a68bd7 disable qnn profiling in release build 2024-07-17 23:24:29 +08:00
hongruichen 2502b57203 fix warnings 2024-07-17 22:10:12 +08:00
hongruichen 454deef83c register qnn backend 2024-07-17 21:25:55 +08:00
hongruichen eed960575f add build step of QNN backend at ggml 2024-07-17 19:43:01 +08:00
hongruichen 861bb9c580 Merge tag 'b3405' into dev-refactoring 2024-07-17 17:13:55 +08:00
hongruichen bb13795dce refactoring: remove unused functions and variables 2024-07-17 14:17:35 +08:00
hongruichen 63dc587dff refactoring: make the buffer alloc and free stay in same class 2024-07-17 14:08:31 +08:00
hongruichen b1ef302991 refactoring: remove depend of dlsym at utils.hpp 2024-07-17 12:21:33 +08:00
Johannes Gäßler 5e116e8dd5
make/cmake: add missing force MMQ/cuBLAS for HIP (#8515) 2024-07-16 21:20:59 +02:00
hongruichen 0301b500cd refactoring: prevent leak the QNN_INTERFACE_VER_TYPE and QNN_SYSTEM_INTERFACE_VER_TYPE outside of qnn.hpp 2024-07-17 00:18:38 +08:00
Xuan Son Nguyen 97bdd26eee
Refactor lora adapter support (#8332)
* lora: load to devide buft

* add patch tensor function

* correct tensor patch

* llama_lora_adapter_apply

* correct ggml_backend_tensor_copy

* add llm_build_mm

* fix auto merge

* update based on review comments

* add convert script

* no more transpose A

* add f16 convert

* add metadata check

* add sanity check

* fix ftype

* add requirements

* fix requirements

* fix outfile

* conversion: only allow selected models

* fix types

* cuda : do not use dmmv if the tensor does not have enough cols

* llama : lora fixes

* do not disable mmap with lora

Co-authored-by: slaren <slarengh@gmail.com>

* llm_build_lora_mm_id

* convert_lora : MoE LoRA conversion support

* convert_lora : prefer safetensors, similarly to convert_hf

* convert_hf : simplify modify_tensors for InternLM2

* convert_lora : lazy conversion

* llama : load and use alpha from LoRA adapters

* llama : use llm_build_lora_mm in most model graphs

* auto scale

* Revert "auto scale"

This reverts commit 42415a4874.

* remove redundant params

* Apply suggestions from code review

Co-authored-by: slaren <slarengh@gmail.com>

* change kv metadata

* move add_type to __init__

* convert_hf : move add_type to main()

* convert_lora : use the GGUFWriter from Model instead of overwriting it

---------

Co-authored-by: slaren <slarengh@gmail.com>
Co-authored-by: Francis Couture-Harpin <git@compilade.net>
2024-07-15 20:50:47 +02:00
Daniel Bevenius 8fac431b06
ggml : suppress unknown pragma 'GCC' on windows (#8460)
This commit adds a macro guard to pragma GCC to avoid the following
warning on windows:

```console
C:\llama.cpp\ggml\src\ggml-aarch64.c(17,9): warning C4068:
unknown pragma 'GCC' [C:\lama.cpp\build\ggml\src\ggml.vcxproj]
```
2024-07-15 15:48:17 +03:00
Meng, Hengyu 16bdfa42ac
[SYCL] add concat through dim 1/2 (#8483)
* add concat through dim 1/2
2024-07-15 19:32:15 +08:00
0cc4m bda62d7999
Vulkan MMQ Fix (#8479)
* Fix incoherence by adding missing LOAD_VEC_A parameter

* Fix Vulkan op result checker build error
2024-07-15 09:38:52 +02:00
hongruichen f32327e2b2 remove multiply declearation of log in unit test 2024-07-15 12:06:12 +08:00
hongruichen 30b40006cc remove unused declarations 2024-07-14 23:50:11 +08:00
hongruichen 148ceab70c add log op 2024-07-14 23:00:50 +08:00
bandoti 17eb6aa8a9
vulkan : cmake integration (#8119)
* Add Vulkan to CMake pkg

* Add Sycl to CMake pkg

* Add OpenMP to CMake pkg

* Split generated shader file into separate translation unit

* Add CMake target for Vulkan shaders

* Update README.md

* Add make target for Vulkan shaders

* Use pkg-config to locate vulkan library

* Add vulkan SDK dep to ubuntu-22-cmake-vulkan workflow

* Clean up tabs

* Move sudo to apt-key invocation

* Forward GGML_EXTRA_LIBS to CMake config pkg

* Update vulkan obj file paths

* Add shaderc to nix pkg

* Add python3 to Vulkan nix build

* Link against ggml in cmake pkg

* Remove Python dependency from Vulkan build

* code review changes

* Remove trailing newline

* Add cflags from pkg-config to fix w64devkit build

* Update README.md

* Remove trailing whitespace

* Update README.md

* Remove trailing whitespace

* Fix doc heading

* Make glslc required Vulkan component

* remove clblast from nix pkg
2024-07-13 18:12:39 +02:00
Georgi Gerganov c917b67f06
metal : template-ify some of the kernels (#8447)
ggml-ci
2024-07-13 18:32:33 +03:00