Commit Graph

3170 Commits

Author SHA1 Message Date
Christian Zhou-Zheng f7ecd99691 appease linter 2024-06-09 13:09:05 -04:00
Christian Zhou-Zheng 5a96b8f27f remove SplitStrategy, SplitArguments 2024-06-09 13:08:06 -04:00
Christian Zhou-Zheng 0471f67f4f cleanup round 1 2024-06-09 12:40:02 -04:00
Christian Zhou-Zheng 49b9fbe942 actually make the linter happy 2024-06-09 11:37:56 -04:00
Christian Zhou-Zheng a234bf821b fix linting 2024-06-09 11:23:55 -04:00
Christian Zhou-Zheng 0779f2f74f tidy up 2024-06-09 11:20:14 -04:00
Christian Zhou-Zheng 69d6e7a8e9 Merge branch 'master' into convert-split 2024-06-09 11:14:02 -04:00
Christian Zhou-Zheng ba1be979eb fix ti data messiness 2024-06-09 11:10:33 -04:00
Christian Zhou-Zheng ff2dd7d30d try to refactor kv data (still fails) 2024-06-09 10:29:47 -04:00
mgroeber9110 3e2ee44315
server: do not remove whitespace at the start of a completion chunk (#7830) 2024-06-09 20:50:35 +10:00
Johannes Gäßler 42b53d192f
CUDA: revise q8_1 data layout for mul_mat_q (#7824) 2024-06-09 09:42:25 +02:00
sasha0552 2decf57bc6
convert-hf : set the model name based on cli arg, if present (#7693)
`--model-name` argument was added a while ago but did not do anything.
This commit fixes this issue and enables this feature.
2024-06-09 16:39:25 +10:00
Christian Zhou-Zheng 97dd416903 kv/ti data are still wrong 2024-06-09 00:34:36 -04:00
Christian Zhou-Zheng 03cc9bcbe8 use simplification from #7827 2024-06-08 23:14:26 -04:00
Christian Zhou-Zheng 666bb097a2 Merge branch 'master' into convert-split 2024-06-08 23:06:18 -04:00
Christian Zhou-Zheng 282e71fb39 edit cmd line args 2024-06-08 23:00:42 -04:00
compilade 5795b94182
convert-hf : match model part name prefix and suffix (#7687)
In #7075, to fix the conversion of (some) models using model-00001-of-00001.safetensors instead of model.safetensors for a single model part we simply used the same logic as the part count to get the part names. 

But this doesn't always work correctly, like when unusual additional model files like consolidated.safetensors in https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3 are present.

This commit matching both the prefix and the suffix of the model part names should fix this problem without breaking any previously-supported upstream models. But according to report by @teleprint-me there is still some
persistent problem, but shall do in the meantime.
2024-06-09 12:47:25 +10:00
compilade ed9f252118
gguf-py : decouple adding metadata from writing in GGUFWriter (#7827)
Main changes of this PR is to consolidate GGUFWriter.add_key and GGUFWriter.add_val into GGUFWriter.add_key_value. 

In addition use_temp_file is now opt-in instead of opt-out defaulting to False.

Also GGUFWriter now does not require output file name until when actually writing to it.

And GGUFWriter doesn't really need to eagerly prepare the data layout of the metadata
2024-06-09 12:34:29 +10:00
slaren fe1e3917cf
Revert "[SYCL] Update rpc-server.cpp to include SYCL backend (#7682)" (#7808)
This reverts commit 9422c5e34b.
2024-06-09 01:43:39 +02:00
Christian Zhou-Zheng 079dfe3a8c
Update convert-hf-to-gguf.py
Co-authored-by: compilade <git@compilade.net>
2024-06-08 15:42:17 -04:00
Olivier Chafik d4d915d351
url: save -mu downloads to new cache location (#7826)
* url: save -mu download to new cache location

* url: fs_get_cache_file_path util

* url: tweak sig of fs_get_cache_file
2024-06-08 21:21:08 +02:00
Christian Zhou-Zheng f658e91f4a comma consistency 2024-06-08 08:10:12 -04:00
sasha0552 7a16ce7db2
server : smart slot selection using Longest Common Prefix (#7728)
* server : Smart selection of available slot using Longest Common Substring

* add usage

* remove trailing whitespaces

* Use Longest Common Prefix (LCP) instead of LCS

* Rename argument
2024-06-08 10:50:31 +03:00
Christian Zhou-Zheng 02be0dd654 attempt 3 to appease the linter 2024-06-07 21:26:40 -04:00
Christian Zhou-Zheng 891b19cb81 attempt 2 to appease the linter 2024-06-07 21:20:46 -04:00
Christian Zhou-Zheng 2e70fa1055 attempt to appease the linter 2024-06-07 21:18:30 -04:00
Christian Zhou-Zheng c6ae1d6799 reinstate original gguf package import and fix type annotation 2024-06-07 21:09:03 -04:00
Christian Zhou-Zheng 9576965ce7 examples/convert-legacy-llama.py: restore executable file permission 2024-06-07 20:51:22 -04:00
Francis Couture-Harpin e093dfba9f convert-hf : restore executable file permission 2024-06-07 17:31:35 -04:00
Christian Zhou-Zheng dc5cf5fd82
Update gguf-py/gguf/gguf_writer_split.py
Co-authored-by: compilade <git@compilade.net>
2024-06-07 17:26:30 -04:00
Christian Zhou-Zheng 0283fc1771 fix line endings 2024-06-07 17:24:27 -04:00
Christian Zhou-Zheng 5f29d4a617 fix convert-hf-to-gguf.py permissions 2024-06-07 17:19:01 -04:00
Christian Zhou-Zheng 1312e287ec
Update gguf-py/gguf/constants.py
Co-authored-by: compilade <git@compilade.net>
2024-06-07 17:10:51 -04:00
slaren da799b4189
vulkan : reuse parent extra for views (#7806)
* vulkan : reuse parent extra for views

* Fix validation error when multiple compute contexts are used in a graph

---------

Co-authored-by: 0cc4m <picard12@live.de>
2024-06-07 19:47:49 +02:00
Christian Zhou-Zheng 6d3a256d1d rename GGUFManager to GGUFWriterSplit 2024-06-07 09:12:44 -04:00
Christian Zhou-Zheng c00fad71e5
gguf-split : change binary multi-byte units to decimal (#7803) 2024-06-07 15:56:01 +03:00
intelmatt 27615f5ab2
cmake : fix BUILD_SHARED_LIBS=ON build (#7784)
common depends on pthreads in Linux
2024-06-07 15:15:07 +03:00
Johannes Gäßler 7027b27d76
server: update cache_prompt documentation [no ci] (#7745) 2024-06-07 11:15:49 +02:00
woodx a5cabd7649
server : do not get prompt in infill mode (#7286)
* avoid to get prompt in infill mode and embedding mode

* remove embedding mode

* refactor format

---------

Co-authored-by: wudexiang <wudexiang@bytedance.com>
2024-06-07 10:09:45 +03:00
pengxin99 d5c938cd77
[SYCL] fix softmax r2r result wrong issue (#7811) 2024-06-07 14:28:26 +08:00
slaren c9ee7118d5
check for nans in imatrix and quantize (#7807)
* imatrix : detect nan/inf values

* quantize : check imatrix for nan/inf values
2024-06-07 09:01:29 +03:00
Georgi Gerganov ee459f40f6
server : fix --threads-http arg (#7801) 2024-06-06 19:19:59 +03:00
Christian Zhou-Zheng 13ffe22ca7 base-1024 bytes to base-1000 2024-06-06 10:24:11 -04:00
Georgi Gerganov f83351f9a6
imatrix : migrate to gpt_params (#7771)
* imatrix : migrate to gpt_params

ggml-ci

* imatrix : add --save-frequency cli arg

* common : fix --no-ppl
2024-06-06 16:30:58 +03:00
Clint Herron ad675e1c67
Added support for . (any character) token in grammar engine. (#6467)
* Added support for . (any characer) token in grammar engine.

* Add integration tests for any-character symbol.
2024-06-06 06:08:52 -07:00
Christian Zhou-Zheng 83e4a3f5cc make pathlib explicit 2024-06-06 09:00:59 -04:00
Christian Zhou-Zheng 2037eabb64 move kv keys to constants.py 2024-06-06 08:49:46 -04:00
Christian Zhou-Zheng 1cbab22225 type consistency in format_n_bytes_to_str 2024-06-06 08:43:26 -04:00
Christian Zhou-Zheng 3328b0a991 Shard dataclass and un-negative dont_add_architecture 2024-06-06 08:37:35 -04:00
Christian Zhou-Zheng 6a05183b97
GGUFWriter compatibility fix
Co-authored-by: compilade <git@compilade.net>
2024-06-06 08:28:10 -04:00