Commit Graph

1315 Commits

Author SHA1 Message Date
Aaron Miller 8564f79036 falcon h2d + reenable vulkan 2023-11-03 17:22:22 -04:00
Aaron Miller 020b1745a0 vulkan: implement neox mode for rope 2023-11-03 17:22:21 -04:00
Aaron Miller ff4212d20f q8 mat*vec 2023-11-03 17:22:21 -04:00
Aaron Miller 9db90cbe12 f16 mv broadcasting fix (gqa fix) 2023-11-03 17:22:21 -04:00
Cebtenzzre 3d850db767 kompute : remove Q6_K from list of supported quant types 2023-11-03 17:22:21 -04:00
Cebtenzzre 24a4a5956a kompute : only try to use Vulkan for LLaMA itself 2023-11-03 17:22:21 -04:00
Adam Treat bc4b5ed1cb Fixes for subgroup size to bring AMD and NVIDIA inline with eachother for all kernels. 2023-11-03 17:22:21 -04:00
Adam Treat de589ced7c Change this back to be in agreement with metal and our previous softmax kernel. 2023-11-03 17:22:21 -04:00
Adam Treat 6ac39752bf Fixup the upstream CMakelists.txt so we can build just llama.cpp with our branch. 2023-11-03 17:22:21 -04:00
Adam Treat 32289aa447 Fixes for norm. 2023-11-03 17:22:21 -04:00
Adam Treat 06d4b21598 Fix offset into the qh and now we have working vulkan accelerated for gguff'd llama. 2023-11-03 17:22:21 -04:00
Adam Treat f1c9bc1821 Add q6_k getrows and mul*vec kernel. 2023-11-03 17:22:21 -04:00
Adam Treat 4b223ec432 Refactor getrows to use common code and get ready for q6_k. 2023-11-03 17:22:21 -04:00
Adam Treat 5509f74318 Minor cleanup. 2023-11-03 17:22:21 -04:00
Adam Treat 601905e75e Move the subgroups and printf into common. 2023-11-03 17:22:21 -04:00
Adam Treat 93306f16d0 Consolidate code for mat x vec kernels and use subgroups more extensively. 2023-11-03 17:22:21 -04:00
Adam Treat 77135a3bf5 Add a common boilerplate code via include and elim copy pasta 2023-11-03 17:22:21 -04:00
Adam Treat 9e4f8b4acc Upload immediately to device. 2023-11-03 17:22:21 -04:00
Cebtenzzre 6b6c73a9e3 kompute : don't fail build because of -Warray-bounds
There are some warnings in debug builds that are likely to be false
positives.
2023-11-03 17:22:21 -04:00
Adam Treat 1b1416d7b7 Support for gguf. 2023-11-03 17:22:20 -04:00
Adam Treat 2c24d67e7b Don't crash on available devices if we can't even create an instance. 2023-10-05 13:39:18 -04:00
Adam Treat addac25293 Set the singleton to nullptr here. 2023-10-05 13:39:18 -04:00
Adam Treat 68aca6be08 Only use vulkan with known quant that work. 2023-10-05 13:39:18 -04:00
Adam Treat 4ed25b2f88 Sync from device back to host at begin of new prompt. 2023-10-05 13:39:18 -04:00
Adam Treat bd5f6399bb Don't try and install kompute artifacts. 2023-10-05 13:39:18 -04:00
Aaron Miller 8bea719879 vulkan: disambiguate gpus with the same name 2023-10-05 13:39:18 -04:00
Adam Treat 68cf1df6fb Throw an exception when allocation fails for vulkan. 2023-10-05 13:39:18 -04:00
Aaron Miller beee57266f Make kompute actually include external SDK headers when requested 2023-10-05 13:39:18 -04:00
Adam Treat b7e2e691d4 Completely revamp how we do object management with the vulkan backend and
stop using so many static objects so we can tear down and bring up vulkan
on new devices in the same runtime.
2023-10-05 13:39:18 -04:00
Adam Treat 45c8778b49 Switch to a dynamic dispatch table instead of linking hard against libvulkan. 2023-10-05 13:39:18 -04:00
Aaron Miller 8563fa001f remove dynamic deps from kompute build
should no longer have new external deps other than libvulkan

```
ubuntu@ip-172-31-1-24:~/repo/gpt4all/gpt4all-backend/build$ ldd ./libllamamodel-mainline-avxonly.so
        linux-vdso.so.1 (0x00007ffcb53bb000)
        libvulkan.so.1 => /lib/x86_64-linux-gnu/libvulkan.so.1 (0x00007f239dab5000)
        libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f239d800000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f239d719000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f239da95000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f239d400000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f239dd1d000)
```
2023-10-05 13:39:18 -04:00
Adam Treat 48a45ea435 Remove warning which fails on windows. 2023-10-05 13:39:18 -04:00
niansa ba15dfd0be Nomic vulkan backend licensed under the Software for Open Models License (SOM), version 1.0. 2023-10-05 13:39:18 -04:00
Kevin Ji 45855b3f1c
docs : mark code as Bash (#3375) 2023-09-28 09:11:32 -04:00
Pierre Alexandre SCHEMBRI 4aea3b846e
readme : add Mistral AI release 0.1 (#3362) 2023-09-28 15:13:37 +03:00
slaren da0400344b
ggml-cuda : perform cublas fp16 matrix multiplication as fp16 (#3370)
* ggml-cuda : perform cublas fp16 matrix multiplication as fp16

* try to fix rocm build

* restrict fp16 mat mul to volta and up
2023-09-28 13:08:28 +03:00
Zhang Peiyuan e519621010
convert : remove bug in convert.py permute function (#3364) 2023-09-27 20:45:20 +02:00
Richard Roberson ac43576124
make-ggml.py : compatibility with more models and GGUF (#3290)
* Resync my fork with new llama.cpp commits

* examples : rename to use dash instead of underscore

* New model conversions

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-09-27 19:25:12 +03:00
Cebtenzzre 20c7e1e804
gguf : fix a few general keys (#3341) 2023-09-27 12:18:07 -04:00
Rickard Hallerbäck dc6897404e
metal : reusing llama.cpp logging (#3152)
* metal : reusing llama.cpp logging

* cmake : build fix

* metal : logging callback

* metal : logging va_args memory fix

* metal : minor cleanup

* metal : setting function like logging macro to capital letters

* llama.cpp : trailing whitespace fix

* ggml : log level enum used by llama

* Makefile : cleanup ggml-metal recipe

* ggml : ggml_log_callback typedef

* ggml : minor

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-09-27 18:48:33 +03:00
Jag Chadha 527e57cfd8
build : add ACCELERATE_NEW_LAPACK to fix warning on macOS Sonoma (#3342) 2023-09-27 18:34:32 +03:00
BarfingLemurs ffe88a36a9
readme : add some recent perplexity and bpw measurements to READMES, link for k-quants (#3340)
* Update README.md

* Update README.md

* Update README.md with k-quants bpw measurements
2023-09-27 18:30:36 +03:00
DAN™ 99115f3fa6
cmake : fix build-info.h on MSVC (#3309) 2023-09-25 18:45:33 -04:00
2f38b454 1726f9626f
docs: Fix typo CLBlast_DIR var. (#3330) 2023-09-25 20:24:52 +02:00
Erik Scholz a98b1633d5
nix : add cuda, use a symlinked toolkit for cmake (#3202) 2023-09-25 13:48:30 +02:00
slaren c091cdfb24
llama-bench : add README (#3317)
* llama-bench : add README

* minor edit
2023-09-23 21:48:24 +02:00
Cebtenzzre 51a7cf5c6e
examples : fix RoPE defaults to match PR #3240 (#3315) 2023-09-23 12:28:50 +03:00
Kevin Ji bedb92b603
scripts : use `/usr/bin/env` in shebang (#3313) 2023-09-22 23:52:23 -04:00
Lee Drake bc9d3e3971
Update README.md (#3289)
* Update README.md

* Update README.md

Co-authored-by: slaren <slarengh@gmail.com>

---------

Co-authored-by: slaren <slarengh@gmail.com>
2023-09-21 21:00:24 +02:00
shibe2 36b904e200
ggml-opencl.cpp: Make private functions static (#3300) 2023-09-21 14:10:26 -04:00