Commit Graph

1584 Commits

Author SHA1 Message Date
Adam Treat 74ddf0f17d Fix synchronization problem for AMD Radeon with amdvlk driver or windows
drivers. Does not have any performance or fidelity effect on other gpu/driver
combos I've tested.

FIXES: https://github.com/nomic-ai/gpt4all/issues/1507
2023-11-03 17:22:22 -04:00
Adam Treat 8d9efbf97a Lower the workgroup count for some shaders by providing a loop that processes
four floats at a time.
2023-11-03 17:22:22 -04:00
Adam Treat 752f7ebd61 Remove unused push constant that was giving validation errors. 2023-11-03 17:22:22 -04:00
Adam Treat 8400015337 Don't try an allocation on a heap that is smaller than the size we require. 2023-11-03 17:22:22 -04:00
cebtenzzre cbc0d1af79 kompute : make scripts executable 2023-11-03 17:22:22 -04:00
cebtenzzre 21841d3163 kompute : enable kp_logger and make it static (#8) 2023-11-03 17:22:22 -04:00
Aaron Miller cc05a602d6 use mat*vec shaders for mat*mat
I wrote the mat*mat shaders from scratch so I understand them better but
they are currently not faster than just multiply-invoking the mat*vec
shaders, by a significant degree - so, except for f32 which needed a new
shader, revert to the m*v ones here.
2023-11-03 17:22:22 -04:00
Aaron Miller c1fd64548d attempted speedups 2 2023-11-03 17:22:22 -04:00
Aaron Miller 9bc52ebae3 attempted speedups 2023-11-03 17:22:22 -04:00
Aaron Miller 8dc79ac380 clean up vulkan/cpu switch 2023-11-03 17:22:22 -04:00
Aaron Miller cd0257ed0d q4_1 mat*mat 2023-11-03 17:22:22 -04:00
Aaron Miller 4809890d80 rm commented dbg print 2023-11-03 17:22:22 -04:00
Aaron Miller b78a94bc6d q6k mm works 2023-11-03 17:22:22 -04:00
Aaron Miller d5741c07a5 use op param epsilon for norms 2023-11-03 17:22:22 -04:00
Aaron Miller 3327d84a7f perf: use bigger threadgroups in mm 2023-11-03 17:22:22 -04:00
Aaron Miller 46385ee0d5 misc vulkan cleanup
make pushconts consistent w/ dispatch, avoid a double free
2023-11-03 17:22:22 -04:00
Aaron Miller f0cd38b9ad add mat*mat ops 2023-11-03 17:22:22 -04:00
Adam Treat 09d83f0401 Delete TODO now that we have q8_0. 2023-11-03 17:22:22 -04:00
Aaron Miller 8564f79036 falcon h2d + reenable vulkan 2023-11-03 17:22:22 -04:00
Aaron Miller 020b1745a0 vulkan: implement neox mode for rope 2023-11-03 17:22:21 -04:00
Aaron Miller ff4212d20f q8 mat*vec 2023-11-03 17:22:21 -04:00
Aaron Miller 9db90cbe12 f16 mv broadcasting fix (gqa fix) 2023-11-03 17:22:21 -04:00
Cebtenzzre 3d850db767 kompute : remove Q6_K from list of supported quant types 2023-11-03 17:22:21 -04:00
Cebtenzzre 24a4a5956a kompute : only try to use Vulkan for LLaMA itself 2023-11-03 17:22:21 -04:00
Adam Treat bc4b5ed1cb Fixes for subgroup size to bring AMD and NVIDIA inline with eachother for all kernels. 2023-11-03 17:22:21 -04:00
Adam Treat de589ced7c Change this back to be in agreement with metal and our previous softmax kernel. 2023-11-03 17:22:21 -04:00
Adam Treat 6ac39752bf Fixup the upstream CMakelists.txt so we can build just llama.cpp with our branch. 2023-11-03 17:22:21 -04:00
Adam Treat 32289aa447 Fixes for norm. 2023-11-03 17:22:21 -04:00
Adam Treat 06d4b21598 Fix offset into the qh and now we have working vulkan accelerated for gguff'd llama. 2023-11-03 17:22:21 -04:00
Adam Treat f1c9bc1821 Add q6_k getrows and mul*vec kernel. 2023-11-03 17:22:21 -04:00
Adam Treat 4b223ec432 Refactor getrows to use common code and get ready for q6_k. 2023-11-03 17:22:21 -04:00
Adam Treat 5509f74318 Minor cleanup. 2023-11-03 17:22:21 -04:00
Adam Treat 601905e75e Move the subgroups and printf into common. 2023-11-03 17:22:21 -04:00
Adam Treat 93306f16d0 Consolidate code for mat x vec kernels and use subgroups more extensively. 2023-11-03 17:22:21 -04:00
Adam Treat 77135a3bf5 Add a common boilerplate code via include and elim copy pasta 2023-11-03 17:22:21 -04:00
Adam Treat 9e4f8b4acc Upload immediately to device. 2023-11-03 17:22:21 -04:00
Cebtenzzre 6b6c73a9e3 kompute : don't fail build because of -Warray-bounds
There are some warnings in debug builds that are likely to be false
positives.
2023-11-03 17:22:21 -04:00
Adam Treat 1b1416d7b7 Support for gguf. 2023-11-03 17:22:20 -04:00
Peter Sugihara d9b33fe95b
metal : round up to 16 to fix MTLDebugComputeCommandEncoder assertion (#3938) 2023-11-03 21:18:18 +02:00
Xiao-Yong Jin 5ba3746171
ggml-metal: fix yarn rope (#3937) 2023-11-03 14:00:31 -04:00
slaren abb77e7319
ggml-cuda : move row numbers to x grid dim in mmv kernels (#3921) 2023-11-03 12:13:09 +01:00
Georgi Gerganov 8f961abdc4
speculative : change default p_accept to 0.5 + CLI args (#3919)
ggml-ci
2023-11-03 09:41:56 +02:00
Georgi Gerganov 05816027d6
common : YAYF (yet another YARN fix) (#3925)
ggml-ci
2023-11-03 09:24:00 +02:00
cebtenzzre 3fdbe6b66b
llama : change yarn_ext_factor placeholder to -1 (#3922) 2023-11-03 08:31:58 +02:00
Kerfuffle 629f917cd6
cuda : add ROCM aliases for CUDA pool stuff (#3918) 2023-11-02 21:58:22 +02:00
Andrei 51b2fc11f7
cmake : fix relative path to git submodule index (#3915) 2023-11-02 21:40:31 +02:00
Georgi Gerganov 224e7d5b14
readme : add notice about #3912 2023-11-02 20:44:12 +02:00
Georgi Gerganov c7743fe1c1
cuda : fix const ptrs warning causing ROCm build issues (#3913) 2023-11-02 20:32:11 +02:00
Oleksii Maryshchenko d6069051de
cuda : use CUDA memory pool with async memory allocation/deallocation when available (#3903)
* Using cuda memory pools for async alloc/dealloc.

* If cuda device doesnt support memory pool than use old implementation.

* Removed redundant cublasSetStream

---------

Co-authored-by: Oleksii Maryshchenko <omaryshchenko@dtis.com>
2023-11-02 19:10:39 +02:00
Georgi Gerganov 4ff1046d75
gguf : print error for GGUFv1 files (#3908) 2023-11-02 16:22:30 +02:00