Commit Graph

68 Commits

Author SHA1 Message Date
Justine Tunney cbddf4661b
Get mmap() working with WIN32 MSVC
- We have pretty high quality POSIX polyfills now
- We no longer need to override malloc()

Tracked by issue #91
Improves upon #341
2023-03-28 10:10:02 -07:00
oKatanaaa e4881686b4
Make WIN32 mmap() improvements (#341)
Still not fully working yet.

Closes #341
2023-03-28 09:19:03 -07:00
Justine Tunney 0b5448a3a4
Implement system polyfill for win32 / posix.1
I don't have access to Microsoft Visual Studio right now (aside from the
the Github Actions CI system) but I think this code should come close to
what we want in terms of polyfilling UNIX functionality.
2023-03-17 21:22:40 -07:00
Justine Tunney 5b8023d935
Implement prototype for instant mmap() loading
This change uses a custom malloc() implementation to transactionally
capture to a file dynamic memory created during the loading process.
That includes (1) the malloc() allocation for mem_buffer and (2) all
the C++ STL objects. On my $1000 personal computer, this change lets
me run ./main to generate a single token (-n 1) using the float16 7B
model (~12gb size) in one second. In order to do that, there's a one
time cost where a 13gb file needs to be generated. This change rocks
but it shouldn't be necessary to do something this heroic. We should
instead change the file format, so that tensors don't need reshaping
and realignment in order to be loaded.
2023-03-16 22:16:33 -07:00
Justine Tunney 2788f373be
Get the build working 2023-03-15 03:14:20 -07:00
Ronsor 47857e564c
Don't use vdotq_s32 if it's not available (#139)
* Don't use vdotq_s32 if it's not available

`dotprod` extensions aren't available on some ARM CPUs (e.g. Raspberry Pi 4), so check for them and only use them if they're available.

Reintroduces the code removed in 84d9015 if `__ARM_FEATURE_DOTPROD` isn't defined.

* Update ggml.c

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-14 21:34:37 +02:00
Radoslav Gerganov 60f819a2b1
Add section to README on how to run the project on Android (#130) 2023-03-14 15:30:08 +02:00
Georgi Gerganov 97ab2b2578
Add Misc section + update hot topics + minor fixes 2023-03-14 09:43:52 +02:00
Sebastián A 2f700a2738
Add windows to the CI (#98) 2023-03-13 22:29:10 +02:00
Georgi Gerganov c09a9cfb06
CMake build in Release by default (#75) 2023-03-13 21:22:15 +02:00
Georgi Gerganov 7ec903d3c1
Update contribution section, hot topics, limitations, etc. 2023-03-13 19:21:51 +02:00
Georgi Gerganov 4497ad819c
Print system information 2023-03-13 19:15:08 +02:00
Sebastián A ed6849cc07
Initial support for CMake (#75) 2023-03-13 19:12:33 +02:00
Thomas Klausner 41be0a3b3d
Add NetBSD support. (#90) 2023-03-13 18:40:54 +02:00
Pavol Rusnak 671d5cac15
Use fprintf for diagnostic output (#48)
keep printf only for printing model output

one can now use ./main ... 2>dev/null to suppress any diagnostic output
2023-03-13 18:39:56 +02:00
Georgi Gerganov 84d9015c4a
Use vdotq_s32 to improve performance (#67)
* 10% performance boost on ARM

* Back to original change
2023-03-13 18:36:44 +02:00
uint256_t 63fd76fbb0
Reduce model loading time (#43)
* Use buffering

* Use vector

* Minor

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-13 18:33:43 +02:00
Val Kharitonov 2a20f48efa
Fix UTF-8 handling (including colors) (#79) 2023-03-13 18:24:18 +02:00
Pavol Rusnak d1f224712d
Add quantize script for batch quantization (#92)
* Add quantize script for batch quantization

* Indentation

* README for new quantize.sh

* Fix script name

* Fix file list on Mac OS

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-13 18:15:20 +02:00
Georgi Gerganov 1808ee0500
Add initial contribution guidelines 2023-03-13 09:42:26 +02:00
Matvey Soloviev a169bb889c Gate signal support on being on a unixoid system. (#74) 2023-03-13 04:08:01 +01:00
Matvey Soloviev 460c482540 Fix token count accounting 2023-03-13 01:04:41 +01:00
Georgi Gerganov c80e2a8f2a
Revert "10% performance boost on ARM"
This reverts commit 113a9e83eb.

There are some reports for illegal instruction.
Moved this stuff to vdotq_s32 branch until resolve
2023-03-13 01:28:08 +02:00
Georgi Gerganov 54a0e66ea0
Check for vdotq_s32 availability 2023-03-13 01:21:03 +02:00
Georgi Gerganov 543c57e991
Ammend to previous commit - forgot to update non-QRDMX branch 2023-03-13 01:05:24 +02:00
Georgi Gerganov 113a9e83eb
10% performance boost on ARM 2023-03-13 00:56:10 +02:00
Matvey Soloviev 404fac0d62
Fix color getting reset before prompt output done (#65)
(cherry picked from commit 7eb2987619feee04c40eff69b604017d09919cb6)
2023-03-13 00:07:34 +02:00
Georgi Gerganov 1a0a74300f
Update README.md 2023-03-12 23:39:01 +02:00
Matvey Soloviev 96ea727f47
Add interactive mode (#61)
* Initial work on interactive mode.

* Improve interactive mode. Make rev. prompt optional.

* Update README to explain interactive mode.

* Fix OS X build
2023-03-12 23:13:28 +02:00
Marc Köhlbrugge 9661954835
Fix typo in README (#45) 2023-03-12 22:30:08 +02:00
Ben Garney f385f8dee8
Allow using prompt files (#59) 2023-03-12 22:28:36 +02:00
beiller 02f0c6fe7f
Add back top_k (#56)
* Add back top_k

* Update utils.cpp

* Update utils.h

---------

Co-authored-by: Bill Hamilton <bill.hamilton@shopify.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-12 22:23:15 +02:00
Sebastián A eb062bb012
Windows fixes (#31)
* Apply fixes suggested to build on windows

Issue: https://github.com/ggerganov/llama.cpp/issues/22

* Remove unsupported VLAs

* MSVC: Remove features that are only available on MSVC C++20.

* Fix zero initialization of the other fields.

* Change the use of vector for stack allocations.
2023-03-12 22:15:00 +02:00
Georgi Gerganov 7027a97837
Update README.md 2023-03-12 22:09:26 +02:00
Georgi Gerganov 2d555e5b42
Add CI (#60) 2023-03-12 22:08:24 +02:00
Georgi Gerganov 7c9e54e55e
Revert "weights_only" arg - this causing more trouble than help 2023-03-12 20:59:01 +02:00
Oleksandr Nikitin b9bd1d0141
python/pytorch compat notes (#44) 2023-03-12 14:16:33 +02:00
beiller 129c7d1ea8
Add repetition penalty (#20)
* Adding repeat penalization

* Update utils.h

* Update utils.cpp

* Numeric fix

Should probably still scale by temp even if penalized

* Update comments, more proper application

I see that numbers can go negative so a fix from a referenced commit

* Minor formatting

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-12 11:27:42 +02:00
Georgi Gerganov 702fddf5c5
Clarify meaning of hacking 2023-03-12 09:03:25 +02:00
Georgi Gerganov 7d86e25bf6
README: add "Supported platforms" + update hot topics 2023-03-12 08:41:54 +02:00
deepdiffuser a93120236f
use weights_only in conversion script (#32)
this restricts malicious weights from executing arbitrary code by restricting the unpickler to only loading tensors, primitive types, and dictionaries
2023-03-12 08:36:35 +02:00
Pavol Rusnak 6a9a67f0be
Add LICENSE (#21) 2023-03-12 08:36:03 +02:00
Georgi Gerganov da1a4ff01f
Update README.md 2023-03-12 01:26:32 +02:00
Juraj Bednar 6b2cb6302f
Fix a typo in model name (#16) 2023-03-11 19:32:20 +02:00
Georgi Gerganov 4235e3d5b3
Update README.md 2023-03-11 18:10:18 +02:00
Georgi Gerganov f1eaff4721 Add AVX2 support for x86 architectures thanks to @Const-me ! 2023-03-11 18:04:25 +02:00
Georgi Gerganov a9e58529ea Fix un-initialized FP16 tables on x86 (#15, #2) 2023-03-11 17:40:14 +02:00
Georgi Gerganov 7d9ed7b25f
Bump memory buffer 2023-03-11 12:45:01 +02:00
Georgi Gerganov 0c6803321c
Update README.md 2023-03-11 12:31:21 +02:00
Georgi Gerganov f60fa9e50a
.gitignore models/ 2023-03-11 12:27:02 +02:00