llama.cpp

Commit Graph

Author	SHA1	Message	Date
Aaron Miller	8564f79036	falcon h2d + reenable vulkan	2023-11-03 17:22:22 -04:00
Aaron Miller	020b1745a0	vulkan: implement neox mode for rope	2023-11-03 17:22:21 -04:00
Aaron Miller	ff4212d20f	q8 mat*vec	2023-11-03 17:22:21 -04:00
Aaron Miller	9db90cbe12	f16 mv broadcasting fix (gqa fix)	2023-11-03 17:22:21 -04:00
Cebtenzzre	3d850db767	kompute : remove Q6_K from list of supported quant types	2023-11-03 17:22:21 -04:00
Cebtenzzre	24a4a5956a	kompute : only try to use Vulkan for LLaMA itself	2023-11-03 17:22:21 -04:00
Adam Treat	bc4b5ed1cb	Fixes for subgroup size to bring AMD and NVIDIA inline with eachother for all kernels.	2023-11-03 17:22:21 -04:00
Adam Treat	de589ced7c	Change this back to be in agreement with metal and our previous softmax kernel.	2023-11-03 17:22:21 -04:00
Adam Treat	6ac39752bf	Fixup the upstream CMakelists.txt so we can build just llama.cpp with our branch.	2023-11-03 17:22:21 -04:00
Adam Treat	32289aa447	Fixes for norm.	2023-11-03 17:22:21 -04:00
Adam Treat	06d4b21598	Fix offset into the qh and now we have working vulkan accelerated for gguff'd llama.	2023-11-03 17:22:21 -04:00
Adam Treat	f1c9bc1821	Add q6_k getrows and mul*vec kernel.	2023-11-03 17:22:21 -04:00
Adam Treat	4b223ec432	Refactor getrows to use common code and get ready for q6_k.	2023-11-03 17:22:21 -04:00
Adam Treat	5509f74318	Minor cleanup.	2023-11-03 17:22:21 -04:00
Adam Treat	601905e75e	Move the subgroups and printf into common.	2023-11-03 17:22:21 -04:00
Adam Treat	93306f16d0	Consolidate code for mat x vec kernels and use subgroups more extensively.	2023-11-03 17:22:21 -04:00
Adam Treat	77135a3bf5	Add a common boilerplate code via include and elim copy pasta	2023-11-03 17:22:21 -04:00
Adam Treat	9e4f8b4acc	Upload immediately to device.	2023-11-03 17:22:21 -04:00
Cebtenzzre	6b6c73a9e3	kompute : don't fail build because of -Warray-bounds There are some warnings in debug builds that are likely to be false positives.	2023-11-03 17:22:21 -04:00
Adam Treat	1b1416d7b7	Support for gguf.	2023-11-03 17:22:20 -04:00
Adam Treat	2c24d67e7b	Don't crash on available devices if we can't even create an instance.	2023-10-05 13:39:18 -04:00
Adam Treat	addac25293	Set the singleton to nullptr here.	2023-10-05 13:39:18 -04:00
Adam Treat	68aca6be08	Only use vulkan with known quant that work.	2023-10-05 13:39:18 -04:00
Adam Treat	4ed25b2f88	Sync from device back to host at begin of new prompt.	2023-10-05 13:39:18 -04:00
Adam Treat	bd5f6399bb	Don't try and install kompute artifacts.	2023-10-05 13:39:18 -04:00
Aaron Miller	8bea719879	vulkan: disambiguate gpus with the same name	2023-10-05 13:39:18 -04:00
Adam Treat	68cf1df6fb	Throw an exception when allocation fails for vulkan.	2023-10-05 13:39:18 -04:00
Aaron Miller	beee57266f	Make kompute actually include external SDK headers when requested	2023-10-05 13:39:18 -04:00
Adam Treat	b7e2e691d4	Completely revamp how we do object management with the vulkan backend and stop using so many static objects so we can tear down and bring up vulkan on new devices in the same runtime.	2023-10-05 13:39:18 -04:00
Adam Treat	45c8778b49	Switch to a dynamic dispatch table instead of linking hard against libvulkan.	2023-10-05 13:39:18 -04:00
Aaron Miller	8563fa001f	remove dynamic deps from kompute build should no longer have new external deps other than libvulkan ``` ubuntu@ip-172-31-1-24:~/repo/gpt4all/gpt4all-backend/build$ ldd ./libllamamodel-mainline-avxonly.so linux-vdso.so.1 (0x00007ffcb53bb000) libvulkan.so.1 => /lib/x86_64-linux-gnu/libvulkan.so.1 (0x00007f239dab5000) libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f239d800000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f239d719000) libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f239da95000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f239d400000) /lib64/ld-linux-x86-64.so.2 (0x00007f239dd1d000) ```	2023-10-05 13:39:18 -04:00
Adam Treat	48a45ea435	Remove warning which fails on windows.	2023-10-05 13:39:18 -04:00
niansa	ba15dfd0be	Nomic vulkan backend licensed under the Software for Open Models License (SOM), version 1.0.	2023-10-05 13:39:18 -04:00
Kevin Ji	45855b3f1c	docs : mark code as Bash (#3375 )	2023-09-28 09:11:32 -04:00
Pierre Alexandre SCHEMBRI	4aea3b846e	readme : add Mistral AI release 0.1 (#3362 )	2023-09-28 15:13:37 +03:00
slaren	da0400344b	ggml-cuda : perform cublas fp16 matrix multiplication as fp16 (#3370 ) * ggml-cuda : perform cublas fp16 matrix multiplication as fp16 * try to fix rocm build * restrict fp16 mat mul to volta and up	2023-09-28 13:08:28 +03:00
Zhang Peiyuan	e519621010	convert : remove bug in convert.py permute function (#3364 )	2023-09-27 20:45:20 +02:00
Richard Roberson	ac43576124	make-ggml.py : compatibility with more models and GGUF (#3290 ) * Resync my fork with new llama.cpp commits * examples : rename to use dash instead of underscore * New model conversions --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-09-27 19:25:12 +03:00
Cebtenzzre	20c7e1e804	gguf : fix a few general keys (#3341 )	2023-09-27 12:18:07 -04:00
Rickard Hallerbäck	dc6897404e	metal : reusing llama.cpp logging (#3152 ) * metal : reusing llama.cpp logging * cmake : build fix * metal : logging callback * metal : logging va_args memory fix * metal : minor cleanup * metal : setting function like logging macro to capital letters * llama.cpp : trailing whitespace fix * ggml : log level enum used by llama * Makefile : cleanup ggml-metal recipe * ggml : ggml_log_callback typedef * ggml : minor --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-09-27 18:48:33 +03:00
Jag Chadha	527e57cfd8	build : add ACCELERATE_NEW_LAPACK to fix warning on macOS Sonoma (#3342 )	2023-09-27 18:34:32 +03:00
BarfingLemurs	ffe88a36a9	readme : add some recent perplexity and bpw measurements to READMES, link for k-quants (#3340 ) * Update README.md * Update README.md * Update README.md with k-quants bpw measurements	2023-09-27 18:30:36 +03:00
DAN™	99115f3fa6	cmake : fix build-info.h on MSVC (#3309 )	2023-09-25 18:45:33 -04:00
2f38b454	1726f9626f	docs: Fix typo CLBlast_DIR var. (#3330 )	2023-09-25 20:24:52 +02:00
Erik Scholz	a98b1633d5	nix : add cuda, use a symlinked toolkit for cmake (#3202 )	2023-09-25 13:48:30 +02:00
slaren	c091cdfb24	llama-bench : add README (#3317 ) * llama-bench : add README * minor edit	2023-09-23 21:48:24 +02:00
Cebtenzzre	51a7cf5c6e	examples : fix RoPE defaults to match PR #3240 (#3315 )	2023-09-23 12:28:50 +03:00
Kevin Ji	bedb92b603	scripts : use `/usr/bin/env` in shebang (#3313 )	2023-09-22 23:52:23 -04:00
Lee Drake	bc9d3e3971	Update README.md (#3289 ) * Update README.md * Update README.md Co-authored-by: slaren <slarengh@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com>	2023-09-21 21:00:24 +02:00
shibe2	36b904e200	ggml-opencl.cpp: Make private functions static (#3300 )	2023-09-21 14:10:26 -04:00

1 2 3 4 5 ...

1315 Commits All Branches Search

1315 Commits

All Branches