Commit Graph

3173 Commits

Author SHA1 Message Date
Olivier Chafik e474ef1df4 update llama-rpc-server bin name + doc 2024-06-11 14:42:03 +01:00
Olivier Chafik ee3a086fdf
Merge pull request #2 from HanClinto/bins-nits-2
Bins nits again
2024-06-11 02:36:25 +01:00
ochafik 166397f1e4 update grammar/README.md w/ new llama-* names 2024-06-11 02:35:30 +01:00
ochafik 2a9c4cd7ba Merge remote-tracking branch 'origin/master' into bins 2024-06-11 02:35:01 +01:00
Olivier Chafik b61eb9644d
json: refine constraint for whitespace to avoid runaways yet allow pretty print (#7866) 2024-06-11 02:22:57 +01:00
Olivier Chafik 396b18dfec
`json`: document schema conversion in GBNF readme, align manual grammar examples & converters (#7841)
* json: fix char pattern in grammar converters

* json: prevent number precision & whitespace runaways in example grammars

* json: add doc to grammar readme
2024-06-11 01:00:30 +01:00
ochafik 8cf8c129d4 Update apps.nix 2024-06-11 00:18:47 +01:00
HanClinto 1f5ec2c0b4 Updating two small `main` references missed earlier in the finetune docs. 2024-06-10 16:12:50 -07:00
Olivier Chafik 82df7f9f0e
Merge pull request #1 from HanClinto/bins-rename-nits
Nits found in binary renames
2024-06-10 23:58:12 +01:00
HanClinto 70de0debab Updating documentation references for lookup-merge and export-lora 2024-06-10 15:32:21 -07:00
Jared Van Bortel 864a99e7a0
cmake : fix CMake requirement for CUDA (#7821) 2024-06-10 18:32:10 -04:00
HanClinto 72660c357c Updating `run-with-preset.py` to use new binary names.
Updating docs around `perplexity` binary rename.
2024-06-10 15:23:32 -07:00
HanClinto 2fd66b2ce2 Updating a few lingering doc references for rename of main to llama-cli 2024-06-10 14:53:23 -07:00
HanClinto e7e03733b2 Updating docs for eval-callback binary to use new `llama-` prefix. 2024-06-10 14:44:46 -07:00
ochafik 0be5f399c4 add two missing llama- prefixes 2024-06-10 22:00:28 +01:00
Olivier Chafik f9cfd04bd4 address gbnf-validator unused fread warning (switched to C++ / ifstream) 2024-06-10 17:38:36 +01:00
Olivier Chafik b8436395b4 rename: llama-cli-cmake-pkg(.exe) 2024-06-10 16:23:45 +01:00
Olivier Chafik 4881a94bee fix test-eval-callback 2024-06-10 16:21:14 +01:00
Olivier Chafik b8cb44e812 more llama-cli(.exe) 2024-06-10 16:08:06 +01:00
Olivier Chafik 051633ed2d update dockerfile refs 2024-06-10 16:05:11 +01:00
Olivier Chafik 1cc651446d rename(make): llama-baby-llama 2024-06-10 16:03:18 +01:00
Olivier Chafik 0fcf2c328e rename dockerfile w/ llama-cli 2024-06-10 15:44:49 +01:00
Olivier Chafik 0bb2a3f233 fix some missing -cli suffixes 2024-06-10 15:42:20 +01:00
Olivier Chafik daeaeb1222 Merge remote-tracking branch 'origin/master' into bins 2024-06-10 15:38:41 +01:00
Olivier Chafik 5265c15d4c rename llama|main -> llama-cli; consistent RPM bin prefixes 2024-06-10 15:34:14 +01:00
slaren fd5ea0f897
ci : try win-2019 on server windows test (#7854) 2024-06-10 15:18:41 +03:00
Georgi Gerganov c28a83902c
examples : remove --instruct remnants (#7846) 2024-06-10 15:00:15 +03:00
Georgi Gerganov d9da0e4986
server : improve "prompt" handling (#7847) 2024-06-10 14:59:55 +03:00
Johannes Gäßler 1f0dabda8d
CUDA: use tensor cores for MMQ (#7676)
* CUDA: int8 tensor cores for MMQ (legacy quants)

* fix out-of-bounds writes

* __builtin_assume -> GGML_CUDA_ASSUME

* fix writeback returning too early
2024-06-10 11:45:13 +02:00
Ben Ashbaugh af4ae502dd
use the correct SYCL context for host USM allocations (#7777)
Signed-off-by: Ben Ashbaugh <ben.ashbaugh@intel.com>
2024-06-10 10:21:31 +01:00
Georgi Gerganov 10ceba354a
flake.lock: Update (#7838)
Flake lock file updates:

• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/ad57eef4ef0659193044870c731987a6df5cf56b?narHash=sha256-SzDKxseEcHR5KzPXLwsemyTR/kaM9whxeiJohbL04rs%3D' (2024-05-29)
  → 'github:NixOS/nixpkgs/051f920625ab5aabe37c920346e3e69d7d34400e?narHash=sha256-4q0s6m0GUcN7q%2BY2DqD27iLvbcd1G50T2lv08kKxkSI%3D' (2024-06-07)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2024-06-09 16:04:50 -07:00
Georgi Gerganov e95beeb1fc
imatrix : handle partial entries (#7833) 2024-06-09 20:19:35 +03:00
Nicolás Pérez 57bf62ce7c
docs: Added initial PR template with directions for doc only changes and squash merges [no ci] (#7700)
This commit adds pull_request_template.md and CONTRIBUTING.md . It focuses on explaining to contributors the need to rate PR complexity level, when to add [no ci] and how to format PR title and descriptions.

Co-authored-by: Brian <mofosyne@gmail.com>
Co-authored-by: compilade <git@compilade.net>
2024-06-10 01:24:29 +10:00
mgroeber9110 3e2ee44315
server: do not remove whitespace at the start of a completion chunk (#7830) 2024-06-09 20:50:35 +10:00
Johannes Gäßler 42b53d192f
CUDA: revise q8_1 data layout for mul_mat_q (#7824) 2024-06-09 09:42:25 +02:00
sasha0552 2decf57bc6
convert-hf : set the model name based on cli arg, if present (#7693)
`--model-name` argument was added a while ago but did not do anything.
This commit fixes this issue and enables this feature.
2024-06-09 16:39:25 +10:00
compilade 5795b94182
convert-hf : match model part name prefix and suffix (#7687)
In #7075, to fix the conversion of (some) models using model-00001-of-00001.safetensors instead of model.safetensors for a single model part we simply used the same logic as the part count to get the part names. 

But this doesn't always work correctly, like when unusual additional model files like consolidated.safetensors in https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3 are present.

This commit matching both the prefix and the suffix of the model part names should fix this problem without breaking any previously-supported upstream models. But according to report by @teleprint-me there is still some
persistent problem, but shall do in the meantime.
2024-06-09 12:47:25 +10:00
compilade ed9f252118
gguf-py : decouple adding metadata from writing in GGUFWriter (#7827)
Main changes of this PR is to consolidate GGUFWriter.add_key and GGUFWriter.add_val into GGUFWriter.add_key_value. 

In addition use_temp_file is now opt-in instead of opt-out defaulting to False.

Also GGUFWriter now does not require output file name until when actually writing to it.

And GGUFWriter doesn't really need to eagerly prepare the data layout of the metadata
2024-06-09 12:34:29 +10:00
slaren fe1e3917cf
Revert "[SYCL] Update rpc-server.cpp to include SYCL backend (#7682)" (#7808)
This reverts commit 9422c5e34b.
2024-06-09 01:43:39 +02:00
Olivier Chafik d4d915d351
url: save -mu downloads to new cache location (#7826)
* url: save -mu download to new cache location

* url: fs_get_cache_file_path util

* url: tweak sig of fs_get_cache_file
2024-06-08 21:21:08 +02:00
Olivier Chafik 347f30803f rename Dockerfiles 2024-06-08 15:10:32 +01:00
Olivier Chafik 78eae7f3ba gitignore /llama-* 2024-06-08 14:29:35 +01:00
Olivier Chafik efaa441233 fix llama-lookup-* Makefile rules 2024-06-08 14:26:11 +01:00
Olivier Chafik b0eb3b88e9 rm bin files 2024-06-08 14:16:32 +01:00
Olivier Chafik eef922e02e sort cmake example subdirs 2024-06-08 14:09:28 +01:00
Olivier Chafik b648243496 add/fix gbnf-validator subfolder to cmake 2024-06-08 14:07:56 +01:00
Olivier Chafik 81222f02db prefix more cmake targets w/ llama- 2024-06-08 14:05:34 +01:00
Olivier Chafik 10650b692d rename {main->llama}-cmake-pkg binary 2024-06-08 13:57:06 +01:00
Olivier Chafik 78bca8cb07 fix main refs 2024-06-08 13:52:03 +01:00
Olivier Chafik ab5efbb3b6 Prefix all example bins w/ llama- 2024-06-08 13:42:01 +01:00