Commit Graph

1581 Commits

Author SHA1 Message Date
Adam Treat 8400015337 Don't try an allocation on a heap that is smaller than the size we require. 2023-11-03 17:22:22 -04:00
cebtenzzre cbc0d1af79 kompute : make scripts executable 2023-11-03 17:22:22 -04:00
cebtenzzre 21841d3163 kompute : enable kp_logger and make it static (#8) 2023-11-03 17:22:22 -04:00
Aaron Miller cc05a602d6 use mat*vec shaders for mat*mat
I wrote the mat*mat shaders from scratch so I understand them better but
they are currently not faster than just multiply-invoking the mat*vec
shaders, by a significant degree - so, except for f32 which needed a new
shader, revert to the m*v ones here.
2023-11-03 17:22:22 -04:00
Aaron Miller c1fd64548d attempted speedups 2 2023-11-03 17:22:22 -04:00
Aaron Miller 9bc52ebae3 attempted speedups 2023-11-03 17:22:22 -04:00
Aaron Miller 8dc79ac380 clean up vulkan/cpu switch 2023-11-03 17:22:22 -04:00
Aaron Miller cd0257ed0d q4_1 mat*mat 2023-11-03 17:22:22 -04:00
Aaron Miller 4809890d80 rm commented dbg print 2023-11-03 17:22:22 -04:00
Aaron Miller b78a94bc6d q6k mm works 2023-11-03 17:22:22 -04:00
Aaron Miller d5741c07a5 use op param epsilon for norms 2023-11-03 17:22:22 -04:00
Aaron Miller 3327d84a7f perf: use bigger threadgroups in mm 2023-11-03 17:22:22 -04:00
Aaron Miller 46385ee0d5 misc vulkan cleanup
make pushconts consistent w/ dispatch, avoid a double free
2023-11-03 17:22:22 -04:00
Aaron Miller f0cd38b9ad add mat*mat ops 2023-11-03 17:22:22 -04:00
Adam Treat 09d83f0401 Delete TODO now that we have q8_0. 2023-11-03 17:22:22 -04:00
Aaron Miller 8564f79036 falcon h2d + reenable vulkan 2023-11-03 17:22:22 -04:00
Aaron Miller 020b1745a0 vulkan: implement neox mode for rope 2023-11-03 17:22:21 -04:00
Aaron Miller ff4212d20f q8 mat*vec 2023-11-03 17:22:21 -04:00
Aaron Miller 9db90cbe12 f16 mv broadcasting fix (gqa fix) 2023-11-03 17:22:21 -04:00
Cebtenzzre 3d850db767 kompute : remove Q6_K from list of supported quant types 2023-11-03 17:22:21 -04:00
Cebtenzzre 24a4a5956a kompute : only try to use Vulkan for LLaMA itself 2023-11-03 17:22:21 -04:00
Adam Treat bc4b5ed1cb Fixes for subgroup size to bring AMD and NVIDIA inline with eachother for all kernels. 2023-11-03 17:22:21 -04:00
Adam Treat de589ced7c Change this back to be in agreement with metal and our previous softmax kernel. 2023-11-03 17:22:21 -04:00
Adam Treat 6ac39752bf Fixup the upstream CMakelists.txt so we can build just llama.cpp with our branch. 2023-11-03 17:22:21 -04:00
Adam Treat 32289aa447 Fixes for norm. 2023-11-03 17:22:21 -04:00
Adam Treat 06d4b21598 Fix offset into the qh and now we have working vulkan accelerated for gguff'd llama. 2023-11-03 17:22:21 -04:00
Adam Treat f1c9bc1821 Add q6_k getrows and mul*vec kernel. 2023-11-03 17:22:21 -04:00
Adam Treat 4b223ec432 Refactor getrows to use common code and get ready for q6_k. 2023-11-03 17:22:21 -04:00
Adam Treat 5509f74318 Minor cleanup. 2023-11-03 17:22:21 -04:00
Adam Treat 601905e75e Move the subgroups and printf into common. 2023-11-03 17:22:21 -04:00
Adam Treat 93306f16d0 Consolidate code for mat x vec kernels and use subgroups more extensively. 2023-11-03 17:22:21 -04:00
Adam Treat 77135a3bf5 Add a common boilerplate code via include and elim copy pasta 2023-11-03 17:22:21 -04:00
Adam Treat 9e4f8b4acc Upload immediately to device. 2023-11-03 17:22:21 -04:00
Cebtenzzre 6b6c73a9e3 kompute : don't fail build because of -Warray-bounds
There are some warnings in debug builds that are likely to be false
positives.
2023-11-03 17:22:21 -04:00
Adam Treat 1b1416d7b7 Support for gguf. 2023-11-03 17:22:20 -04:00
Peter Sugihara d9b33fe95b
metal : round up to 16 to fix MTLDebugComputeCommandEncoder assertion (#3938) 2023-11-03 21:18:18 +02:00
Xiao-Yong Jin 5ba3746171
ggml-metal: fix yarn rope (#3937) 2023-11-03 14:00:31 -04:00
slaren abb77e7319
ggml-cuda : move row numbers to x grid dim in mmv kernels (#3921) 2023-11-03 12:13:09 +01:00
Georgi Gerganov 8f961abdc4
speculative : change default p_accept to 0.5 + CLI args (#3919)
ggml-ci
2023-11-03 09:41:56 +02:00
Georgi Gerganov 05816027d6
common : YAYF (yet another YARN fix) (#3925)
ggml-ci
2023-11-03 09:24:00 +02:00
cebtenzzre 3fdbe6b66b
llama : change yarn_ext_factor placeholder to -1 (#3922) 2023-11-03 08:31:58 +02:00
Kerfuffle 629f917cd6
cuda : add ROCM aliases for CUDA pool stuff (#3918) 2023-11-02 21:58:22 +02:00
Andrei 51b2fc11f7
cmake : fix relative path to git submodule index (#3915) 2023-11-02 21:40:31 +02:00
Georgi Gerganov 224e7d5b14
readme : add notice about #3912 2023-11-02 20:44:12 +02:00
Georgi Gerganov c7743fe1c1
cuda : fix const ptrs warning causing ROCm build issues (#3913) 2023-11-02 20:32:11 +02:00
Oleksii Maryshchenko d6069051de
cuda : use CUDA memory pool with async memory allocation/deallocation when available (#3903)
* Using cuda memory pools for async alloc/dealloc.

* If cuda device doesnt support memory pool than use old implementation.

* Removed redundant cublasSetStream

---------

Co-authored-by: Oleksii Maryshchenko <omaryshchenko@dtis.com>
2023-11-02 19:10:39 +02:00
Georgi Gerganov 4ff1046d75
gguf : print error for GGUFv1 files (#3908) 2023-11-02 16:22:30 +02:00
slaren 21958bb393
cmake : disable LLAMA_NATIVE by default (#3906) 2023-11-02 14:10:33 +02:00
Georgi Gerganov 2756c4fbff
gguf : remove special-case code for GGUFv1 (#3901)
ggml-ci
2023-11-02 11:20:21 +02:00
Georgi Gerganov 1efae9b7dc
llm : prevent from 1-D tensors being GPU split (#3697) 2023-11-02 09:54:44 +02:00