llama.cpp

Commit Graph

Author	SHA1	Message	Date
Justine Tunney	cbddf4661b	Get mmap() working with WIN32 MSVC - We have pretty high quality POSIX polyfills now - We no longer need to override malloc() Tracked by issue #91 Improves upon #341	2023-03-28 10:10:02 -07:00
oKatanaaa	e4881686b4	Make WIN32 mmap() improvements (#341 ) Still not fully working yet. Closes #341	2023-03-28 09:19:03 -07:00
Justine Tunney	0b5448a3a4	Implement system polyfill for win32 / posix.1 I don't have access to Microsoft Visual Studio right now (aside from the the Github Actions CI system) but I think this code should come close to what we want in terms of polyfilling UNIX functionality.	2023-03-17 21:22:40 -07:00
Justine Tunney	5b8023d935	Implement prototype for instant mmap() loading This change uses a custom malloc() implementation to transactionally capture to a file dynamic memory created during the loading process. That includes (1) the malloc() allocation for mem_buffer and (2) all the C++ STL objects. On my $1000 personal computer, this change lets me run ./main to generate a single token (-n 1) using the float16 7B model (~12gb size) in one second. In order to do that, there's a one time cost where a 13gb file needs to be generated. This change rocks but it shouldn't be necessary to do something this heroic. We should instead change the file format, so that tensors don't need reshaping and realignment in order to be loaded.	2023-03-16 22:16:33 -07:00
Justine Tunney	2788f373be	Get the build working	2023-03-15 03:14:20 -07:00
Ronsor	47857e564c	Don't use vdotq_s32 if it's not available (#139 ) * Don't use vdotq_s32 if it's not available `dotprod` extensions aren't available on some ARM CPUs (e.g. Raspberry Pi 4), so check for them and only use them if they're available. Reintroduces the code removed in `84d9015` if `__ARM_FEATURE_DOTPROD` isn't defined. * Update ggml.c --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-14 21:34:37 +02:00
Radoslav Gerganov	60f819a2b1	Add section to README on how to run the project on Android (#130 )	2023-03-14 15:30:08 +02:00
Georgi Gerganov	97ab2b2578	Add Misc section + update hot topics + minor fixes	2023-03-14 09:43:52 +02:00
Sebastián A	2f700a2738	Add windows to the CI (#98 )	2023-03-13 22:29:10 +02:00
Georgi Gerganov	c09a9cfb06	CMake build in Release by default (#75 )	2023-03-13 21:22:15 +02:00
Georgi Gerganov	7ec903d3c1	Update contribution section, hot topics, limitations, etc.	2023-03-13 19:21:51 +02:00
Georgi Gerganov	4497ad819c	Print system information	2023-03-13 19:15:08 +02:00
Sebastián A	ed6849cc07	Initial support for CMake (#75 )	2023-03-13 19:12:33 +02:00
Thomas Klausner	41be0a3b3d	Add NetBSD support. (#90 )	2023-03-13 18:40:54 +02:00
Pavol Rusnak	671d5cac15	Use fprintf for diagnostic output (#48 ) keep printf only for printing model output one can now use ./main ... 2>dev/null to suppress any diagnostic output	2023-03-13 18:39:56 +02:00
Georgi Gerganov	84d9015c4a	Use vdotq_s32 to improve performance (#67 ) * 10% performance boost on ARM * Back to original change	2023-03-13 18:36:44 +02:00
uint256_t	63fd76fbb0	Reduce model loading time (#43 ) * Use buffering * Use vector * Minor --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-13 18:33:43 +02:00
Val Kharitonov	2a20f48efa	Fix UTF-8 handling (including colors) (#79 )	2023-03-13 18:24:18 +02:00
Pavol Rusnak	d1f224712d	Add quantize script for batch quantization (#92 ) * Add quantize script for batch quantization * Indentation * README for new quantize.sh * Fix script name * Fix file list on Mac OS --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-13 18:15:20 +02:00
Georgi Gerganov	1808ee0500	Add initial contribution guidelines	2023-03-13 09:42:26 +02:00
Matvey Soloviev	a169bb889c	Gate signal support on being on a unixoid system. (#74 )	2023-03-13 04:08:01 +01:00
Matvey Soloviev	460c482540	Fix token count accounting	2023-03-13 01:04:41 +01:00
Georgi Gerganov	c80e2a8f2a	Revert "10% performance boost on ARM" This reverts commit `113a9e83eb`. There are some reports for illegal instruction. Moved this stuff to vdotq_s32 branch until resolve	2023-03-13 01:28:08 +02:00
Georgi Gerganov	54a0e66ea0	Check for vdotq_s32 availability	2023-03-13 01:21:03 +02:00
Georgi Gerganov	543c57e991	Ammend to previous commit - forgot to update non-QRDMX branch	2023-03-13 01:05:24 +02:00
Georgi Gerganov	113a9e83eb	10% performance boost on ARM	2023-03-13 00:56:10 +02:00
Matvey Soloviev	404fac0d62	Fix color getting reset before prompt output done (#65 ) (cherry picked from commit 7eb2987619feee04c40eff69b604017d09919cb6)	2023-03-13 00:07:34 +02:00
Georgi Gerganov	1a0a74300f	Update README.md	2023-03-12 23:39:01 +02:00
Matvey Soloviev	96ea727f47	Add interactive mode (#61 ) * Initial work on interactive mode. * Improve interactive mode. Make rev. prompt optional. * Update README to explain interactive mode. * Fix OS X build	2023-03-12 23:13:28 +02:00
Marc Köhlbrugge	9661954835	Fix typo in README (#45 )	2023-03-12 22:30:08 +02:00
Ben Garney	f385f8dee8	Allow using prompt files (#59 )	2023-03-12 22:28:36 +02:00
beiller	02f0c6fe7f	Add back top_k (#56 ) * Add back top_k * Update utils.cpp * Update utils.h --------- Co-authored-by: Bill Hamilton <bill.hamilton@shopify.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-12 22:23:15 +02:00
Sebastián A	eb062bb012	Windows fixes (#31 ) * Apply fixes suggested to build on windows Issue: https://github.com/ggerganov/llama.cpp/issues/22 * Remove unsupported VLAs * MSVC: Remove features that are only available on MSVC C++20. * Fix zero initialization of the other fields. * Change the use of vector for stack allocations.	2023-03-12 22:15:00 +02:00
Georgi Gerganov	7027a97837	Update README.md	2023-03-12 22:09:26 +02:00
Georgi Gerganov	2d555e5b42	Add CI (#60 )	2023-03-12 22:08:24 +02:00
Georgi Gerganov	7c9e54e55e	Revert "weights_only" arg - this causing more trouble than help	2023-03-12 20:59:01 +02:00
Oleksandr Nikitin	b9bd1d0141	python/pytorch compat notes (#44 )	2023-03-12 14:16:33 +02:00
beiller	129c7d1ea8	Add repetition penalty (#20 ) * Adding repeat penalization * Update utils.h * Update utils.cpp * Numeric fix Should probably still scale by temp even if penalized * Update comments, more proper application I see that numbers can go negative so a fix from a referenced commit * Minor formatting --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-12 11:27:42 +02:00
Georgi Gerganov	702fddf5c5	Clarify meaning of hacking	2023-03-12 09:03:25 +02:00
Georgi Gerganov	7d86e25bf6	README: add "Supported platforms" + update hot topics	2023-03-12 08:41:54 +02:00
deepdiffuser	a93120236f	use weights_only in conversion script (#32 ) this restricts malicious weights from executing arbitrary code by restricting the unpickler to only loading tensors, primitive types, and dictionaries	2023-03-12 08:36:35 +02:00
Pavol Rusnak	6a9a67f0be	Add LICENSE (#21 )	2023-03-12 08:36:03 +02:00
Georgi Gerganov	da1a4ff01f	Update README.md	2023-03-12 01:26:32 +02:00
Juraj Bednar	6b2cb6302f	Fix a typo in model name (#16 )	2023-03-11 19:32:20 +02:00
Georgi Gerganov	4235e3d5b3	Update README.md	2023-03-11 18:10:18 +02:00
Georgi Gerganov	f1eaff4721	Add AVX2 support for x86 architectures thanks to @Const-me !	2023-03-11 18:04:25 +02:00
Georgi Gerganov	a9e58529ea	Fix un-initialized FP16 tables on x86 (#15 , #2 )	2023-03-11 17:40:14 +02:00
Georgi Gerganov	7d9ed7b25f	Bump memory buffer	2023-03-11 12:45:01 +02:00
Georgi Gerganov	0c6803321c	Update README.md	2023-03-11 12:31:21 +02:00
Georgi Gerganov	f60fa9e50a	.gitignore models/	2023-03-11 12:27:02 +02:00

1 2

68 Commits All Branches Search

68 Commits

All Branches