Commit Graph

1528 Commits

Author SHA1 Message Date
Jared Van Bortel 9c4dfd06e8 mention skipped change 2023-11-23 17:22:05 -05:00
Jared Van Bortel fe26e6adff Merge commit 'e16b9fa4baa8a09c6619b116159830e898050942' into nomic-vulkan 2023-11-23 17:22:04 -05:00
Jared Van Bortel 6474fc879a vulkan : handle ggml_scale for n%8 != 0
ref ggerganov/llama.cpp#3754
2023-11-23 17:22:00 -05:00
Jared Van Bortel 2a41ba7258 Merge commit '469c9addef75893e6be12edda852d12e840bf064' into nomic-vulkan 2023-11-23 17:22:00 -05:00
Jared Van Bortel a934b2cb8a vulkan : assert various kernel requirements 2023-11-23 17:22:00 -05:00
Jared Van Bortel f194e1b6a6 Merge commit 'fcca0a700487999d52a525c96d6661e9f6a8703a' into nomic-vulkan 2023-11-23 17:21:59 -05:00
Jared Van Bortel 39abedd1d7 vulkan : optimize workgroup sizes 2023-11-23 17:18:48 -05:00
Jared Van Bortel 84f7fc4553 vulkan : rope n_past is now KQ_pos, f16 rope kernel 2023-11-23 17:18:42 -05:00
Jared Van Bortel 71565eb0c3 vulkan : replace ggml_diag_mask_inf with ggml_add (custom -inf mask) 2023-11-23 17:18:27 -05:00
Jared Van Bortel af00cca08e Merge commit 'ec893798b7a2a803466cc8f063051499ec3d96f7' into HEAD 2023-11-08 16:36:00 -05:00
Jared Van Bortel c438c16896 fix build with external fmtlib (v10)
Co-authored-by: ToKiNoBug <tokinobug@163.com>
2023-11-08 16:31:29 -05:00
Jared Van Bortel a8cac53207 kompute : fix issues with debug layers 2023-11-08 16:31:29 -05:00
cebtenzzre f88b198885 llama : fix Vulkan whitelist (#11) 2023-11-03 17:22:22 -04:00
Adam Treat ffd0624be2 Remove this debug code. 2023-11-03 17:22:22 -04:00
Adam Treat a5eb001eab Revert the prompt processing on gpu for now.
Fixes issues #1580 and #1581
2023-11-03 17:22:22 -04:00
Adam Treat e006d377dd Scale the workgroup count down to allow correct generation for falcon with
AMD radeon cards with lower workgroup count limit

Partially fixes #1581
2023-11-03 17:22:22 -04:00
cebtenzzre 89b71278ff llama : decide to disable Vulkan before loading tensors (#7) 2023-11-03 17:22:22 -04:00
cebtenzzre 1c17010188 vulkan : fix missing break in matmul selection (#9) 2023-11-03 17:22:22 -04:00
Adam Treat 74ddf0f17d Fix synchronization problem for AMD Radeon with amdvlk driver or windows
drivers. Does not have any performance or fidelity effect on other gpu/driver
combos I've tested.

FIXES: https://github.com/nomic-ai/gpt4all/issues/1507
2023-11-03 17:22:22 -04:00
Adam Treat 8d9efbf97a Lower the workgroup count for some shaders by providing a loop that processes
four floats at a time.
2023-11-03 17:22:22 -04:00
Adam Treat 752f7ebd61 Remove unused push constant that was giving validation errors. 2023-11-03 17:22:22 -04:00
Adam Treat 8400015337 Don't try an allocation on a heap that is smaller than the size we require. 2023-11-03 17:22:22 -04:00
cebtenzzre cbc0d1af79 kompute : make scripts executable 2023-11-03 17:22:22 -04:00
cebtenzzre 21841d3163 kompute : enable kp_logger and make it static (#8) 2023-11-03 17:22:22 -04:00
Aaron Miller cc05a602d6 use mat*vec shaders for mat*mat
I wrote the mat*mat shaders from scratch so I understand them better but
they are currently not faster than just multiply-invoking the mat*vec
shaders, by a significant degree - so, except for f32 which needed a new
shader, revert to the m*v ones here.
2023-11-03 17:22:22 -04:00
Aaron Miller c1fd64548d attempted speedups 2 2023-11-03 17:22:22 -04:00
Aaron Miller 9bc52ebae3 attempted speedups 2023-11-03 17:22:22 -04:00
Aaron Miller 8dc79ac380 clean up vulkan/cpu switch 2023-11-03 17:22:22 -04:00
Aaron Miller cd0257ed0d q4_1 mat*mat 2023-11-03 17:22:22 -04:00
Aaron Miller 4809890d80 rm commented dbg print 2023-11-03 17:22:22 -04:00
Aaron Miller b78a94bc6d q6k mm works 2023-11-03 17:22:22 -04:00
Aaron Miller d5741c07a5 use op param epsilon for norms 2023-11-03 17:22:22 -04:00
Aaron Miller 3327d84a7f perf: use bigger threadgroups in mm 2023-11-03 17:22:22 -04:00
Aaron Miller 46385ee0d5 misc vulkan cleanup
make pushconts consistent w/ dispatch, avoid a double free
2023-11-03 17:22:22 -04:00
Aaron Miller f0cd38b9ad add mat*mat ops 2023-11-03 17:22:22 -04:00
Adam Treat 09d83f0401 Delete TODO now that we have q8_0. 2023-11-03 17:22:22 -04:00
Aaron Miller 8564f79036 falcon h2d + reenable vulkan 2023-11-03 17:22:22 -04:00
Aaron Miller 020b1745a0 vulkan: implement neox mode for rope 2023-11-03 17:22:21 -04:00
Aaron Miller ff4212d20f q8 mat*vec 2023-11-03 17:22:21 -04:00
Aaron Miller 9db90cbe12 f16 mv broadcasting fix (gqa fix) 2023-11-03 17:22:21 -04:00
Cebtenzzre 3d850db767 kompute : remove Q6_K from list of supported quant types 2023-11-03 17:22:21 -04:00
Cebtenzzre 24a4a5956a kompute : only try to use Vulkan for LLaMA itself 2023-11-03 17:22:21 -04:00
Adam Treat bc4b5ed1cb Fixes for subgroup size to bring AMD and NVIDIA inline with eachother for all kernels. 2023-11-03 17:22:21 -04:00
Adam Treat de589ced7c Change this back to be in agreement with metal and our previous softmax kernel. 2023-11-03 17:22:21 -04:00
Adam Treat 6ac39752bf Fixup the upstream CMakelists.txt so we can build just llama.cpp with our branch. 2023-11-03 17:22:21 -04:00
Adam Treat 32289aa447 Fixes for norm. 2023-11-03 17:22:21 -04:00
Adam Treat 06d4b21598 Fix offset into the qh and now we have working vulkan accelerated for gguff'd llama. 2023-11-03 17:22:21 -04:00
Adam Treat f1c9bc1821 Add q6_k getrows and mul*vec kernel. 2023-11-03 17:22:21 -04:00
Adam Treat 4b223ec432 Refactor getrows to use common code and get ready for q6_k. 2023-11-03 17:22:21 -04:00
Adam Treat 5509f74318 Minor cleanup. 2023-11-03 17:22:21 -04:00