llama.cpp

Commit Graph

Author	SHA1	Message	Date
Adam Treat	8400015337	Don't try an allocation on a heap that is smaller than the size we require.	2023-11-03 17:22:22 -04:00
cebtenzzre	cbc0d1af79	kompute : make scripts executable	2023-11-03 17:22:22 -04:00
cebtenzzre	21841d3163	kompute : enable kp_logger and make it static (#8 )	2023-11-03 17:22:22 -04:00
Aaron Miller	cc05a602d6	use matvec shaders for matmat I wrote the matmat shaders from scratch so I understand them better but they are currently not faster than just multiply-invoking the matvec shaders, by a significant degree - so, except for f32 which needed a new shader, revert to the m*v ones here.	2023-11-03 17:22:22 -04:00
Aaron Miller	c1fd64548d	attempted speedups 2	2023-11-03 17:22:22 -04:00
Aaron Miller	9bc52ebae3	attempted speedups	2023-11-03 17:22:22 -04:00
Aaron Miller	8dc79ac380	clean up vulkan/cpu switch	2023-11-03 17:22:22 -04:00
Aaron Miller	cd0257ed0d	q4_1 mat*mat	2023-11-03 17:22:22 -04:00
Aaron Miller	4809890d80	rm commented dbg print	2023-11-03 17:22:22 -04:00
Aaron Miller	b78a94bc6d	q6k mm works	2023-11-03 17:22:22 -04:00
Aaron Miller	d5741c07a5	use op param epsilon for norms	2023-11-03 17:22:22 -04:00
Aaron Miller	3327d84a7f	perf: use bigger threadgroups in mm	2023-11-03 17:22:22 -04:00
Aaron Miller	46385ee0d5	misc vulkan cleanup make pushconts consistent w/ dispatch, avoid a double free	2023-11-03 17:22:22 -04:00
Aaron Miller	f0cd38b9ad	add mat*mat ops	2023-11-03 17:22:22 -04:00
Adam Treat	09d83f0401	Delete TODO now that we have q8_0.	2023-11-03 17:22:22 -04:00
Aaron Miller	8564f79036	falcon h2d + reenable vulkan	2023-11-03 17:22:22 -04:00
Aaron Miller	020b1745a0	vulkan: implement neox mode for rope	2023-11-03 17:22:21 -04:00
Aaron Miller	ff4212d20f	q8 mat*vec	2023-11-03 17:22:21 -04:00
Aaron Miller	9db90cbe12	f16 mv broadcasting fix (gqa fix)	2023-11-03 17:22:21 -04:00
Cebtenzzre	3d850db767	kompute : remove Q6_K from list of supported quant types	2023-11-03 17:22:21 -04:00
Cebtenzzre	24a4a5956a	kompute : only try to use Vulkan for LLaMA itself	2023-11-03 17:22:21 -04:00
Adam Treat	bc4b5ed1cb	Fixes for subgroup size to bring AMD and NVIDIA inline with eachother for all kernels.	2023-11-03 17:22:21 -04:00
Adam Treat	de589ced7c	Change this back to be in agreement with metal and our previous softmax kernel.	2023-11-03 17:22:21 -04:00
Adam Treat	6ac39752bf	Fixup the upstream CMakelists.txt so we can build just llama.cpp with our branch.	2023-11-03 17:22:21 -04:00
Adam Treat	32289aa447	Fixes for norm.	2023-11-03 17:22:21 -04:00
Adam Treat	06d4b21598	Fix offset into the qh and now we have working vulkan accelerated for gguff'd llama.	2023-11-03 17:22:21 -04:00
Adam Treat	f1c9bc1821	Add q6_k getrows and mul*vec kernel.	2023-11-03 17:22:21 -04:00
Adam Treat	4b223ec432	Refactor getrows to use common code and get ready for q6_k.	2023-11-03 17:22:21 -04:00
Adam Treat	5509f74318	Minor cleanup.	2023-11-03 17:22:21 -04:00
Adam Treat	601905e75e	Move the subgroups and printf into common.	2023-11-03 17:22:21 -04:00
Adam Treat	93306f16d0	Consolidate code for mat x vec kernels and use subgroups more extensively.	2023-11-03 17:22:21 -04:00
Adam Treat	77135a3bf5	Add a common boilerplate code via include and elim copy pasta	2023-11-03 17:22:21 -04:00
Adam Treat	9e4f8b4acc	Upload immediately to device.	2023-11-03 17:22:21 -04:00
Cebtenzzre	6b6c73a9e3	kompute : don't fail build because of -Warray-bounds There are some warnings in debug builds that are likely to be false positives.	2023-11-03 17:22:21 -04:00
Adam Treat	1b1416d7b7	Support for gguf.	2023-11-03 17:22:20 -04:00
Peter Sugihara	d9b33fe95b	metal : round up to 16 to fix MTLDebugComputeCommandEncoder assertion (#3938 )	2023-11-03 21:18:18 +02:00
Xiao-Yong Jin	5ba3746171	ggml-metal: fix yarn rope (#3937 )	2023-11-03 14:00:31 -04:00
slaren	abb77e7319	ggml-cuda : move row numbers to x grid dim in mmv kernels (#3921 )	2023-11-03 12:13:09 +01:00
Georgi Gerganov	8f961abdc4	speculative : change default p_accept to 0.5 + CLI args (#3919 ) ggml-ci	2023-11-03 09:41:56 +02:00
Georgi Gerganov	05816027d6	common : YAYF (yet another YARN fix) (#3925 ) ggml-ci	2023-11-03 09:24:00 +02:00
cebtenzzre	3fdbe6b66b	llama : change yarn_ext_factor placeholder to -1 (#3922 )	2023-11-03 08:31:58 +02:00
Kerfuffle	629f917cd6	cuda : add ROCM aliases for CUDA pool stuff (#3918 )	2023-11-02 21:58:22 +02:00
Andrei	51b2fc11f7	cmake : fix relative path to git submodule index (#3915 )	2023-11-02 21:40:31 +02:00
Georgi Gerganov	224e7d5b14	readme : add notice about #3912	2023-11-02 20:44:12 +02:00
Georgi Gerganov	c7743fe1c1	cuda : fix const ptrs warning causing ROCm build issues (#3913 )	2023-11-02 20:32:11 +02:00
Oleksii Maryshchenko	d6069051de	cuda : use CUDA memory pool with async memory allocation/deallocation when available (#3903 ) * Using cuda memory pools for async alloc/dealloc. * If cuda device doesnt support memory pool than use old implementation. * Removed redundant cublasSetStream --------- Co-authored-by: Oleksii Maryshchenko <omaryshchenko@dtis.com>	2023-11-02 19:10:39 +02:00
Georgi Gerganov	4ff1046d75	gguf : print error for GGUFv1 files (#3908 )	2023-11-02 16:22:30 +02:00
slaren	21958bb393	cmake : disable LLAMA_NATIVE by default (#3906 )	2023-11-02 14:10:33 +02:00
Georgi Gerganov	2756c4fbff	gguf : remove special-case code for GGUFv1 (#3901 ) ggml-ci	2023-11-02 11:20:21 +02:00
Georgi Gerganov	1efae9b7dc	llm : prevent from 1-D tensors being GPU split (#3697 )	2023-11-02 09:54:44 +02:00

1 2 3 4 5 ...

1581 Commits All Branches Search

1581 Commits

All Branches