Commit Graph

3184 Commits

Author SHA1 Message Date
ngxson 334dbaed3f shorten help msg 2024-06-12 17:13:19 +02:00
ngxson c59bfa6368 add print_usage 2024-06-12 17:12:02 +02:00
ngxson b22c8459ff clean up a bit 2024-06-12 16:08:27 +02:00
ngxson a2a5f1bfbd better error handling 2024-06-12 16:01:00 +02:00
ngxson 679f5137f8 move param parser to common 2024-06-12 15:58:20 +02:00
ngxson f54cb8e307 reuse allocr 2024-06-12 12:53:17 +02:00
ngxson 8ee0c96688 fix compile warn 2024-06-12 12:50:29 +02:00
ngxson e683b9af60 attemp to fix compile problem on mac 2024-06-12 12:49:01 +02:00
ngxson 7297817d13 use ggml_backend_tensor_copy 2024-06-12 11:41:37 +02:00
ngxson e9cb3b336d fix .editorconfig 2024-06-11 22:09:14 +02:00
ngxson 5ffba9ecc3 add readme 2024-06-11 19:35:17 +02:00
ngxson 04c91d29ff use ggml_format_name 2024-06-11 19:14:04 +02:00
ngxson 54f77e2467 add to makefile all targets 2024-06-11 19:03:13 +02:00
ngxson 85db22dd20 Merge branch 'master' into xsn/control-vector-generator 2024-06-11 19:00:19 +02:00
Deven Mistry 14f83526cd
fix broken link in pr template (#7880) [no ci]
* fix broken link in pr template

* Update pull_request_template.md [no ci]

---------

Co-authored-by: Brian <mofosyne@gmail.com>
2024-06-12 02:18:58 +10:00
Brian 6fe42d073f
github: move PR template to .github/ root (#7868) 2024-06-11 17:43:41 +03:00
ngxson da6babdf0a fix macos build 2024-06-11 15:47:35 +02:00
ngxson 3223133cf5 default n_pca_batch to 20 2024-06-11 15:05:06 +02:00
Johannes Gäßler 148995e5e5
llama-bench: more compact markdown tables (#7879) 2024-06-11 14:45:40 +02:00
ngxson d41c719980 bring back n_completions 2024-06-11 14:31:45 +02:00
Christian Zhou-Zheng 446da906d9 fix n_completions 2024-06-11 08:22:38 -04:00
ngxson 163916864c remember to copy back the last_eigenvector 2024-06-11 12:40:07 +02:00
ngxson 1a088fb0a5 working version 2024-06-11 12:37:05 +02:00
ngxson 9e39571fc2 add n_batch for pca 2024-06-11 11:45:16 +02:00
Georgi Gerganov 4bfe50f741
tests : check the Python version (#7872)
ggml-ci
2024-06-11 10:10:20 +03:00
Johannes Gäßler bdcb8f4222
CUDA: int8 tensor cores for MMQ (q4_K, q5_K, q6_K) (#7860) 2024-06-11 08:26:07 +02:00
slaren c2ce6c47e4
fix CUDA CI by using a windows-2019 image (#7861)
* try to fix CUDA ci with --allow-unsupported-compiler

* trigger when build.yml changes

* another test

* try exllama/bdashore3 method

* install vs build tools before cuda toolkit

* try win-2019
2024-06-11 08:59:20 +03:00
Olivier Chafik b61eb9644d
json: refine constraint for whitespace to avoid runaways yet allow pretty print (#7866) 2024-06-11 02:22:57 +01:00
Olivier Chafik 396b18dfec
`json`: document schema conversion in GBNF readme, align manual grammar examples & converters (#7841)
* json: fix char pattern in grammar converters

* json: prevent number precision & whitespace runaways in example grammars

* json: add doc to grammar readme
2024-06-11 01:00:30 +01:00
ngxson 6a5adf3d7c fix shape of v_diff_original 2024-06-11 01:33:16 +02:00
ngxson c241b500a1 clean up PCA ggml implementation 2024-06-11 01:13:10 +02:00
Jared Van Bortel 864a99e7a0
cmake : fix CMake requirement for CUDA (#7821) 2024-06-10 18:32:10 -04:00
slaren fd5ea0f897
ci : try win-2019 on server windows test (#7854) 2024-06-10 15:18:41 +03:00
Georgi Gerganov c28a83902c
examples : remove --instruct remnants (#7846) 2024-06-10 15:00:15 +03:00
Georgi Gerganov d9da0e4986
server : improve "prompt" handling (#7847) 2024-06-10 14:59:55 +03:00
Johannes Gäßler 1f0dabda8d
CUDA: use tensor cores for MMQ (#7676)
* CUDA: int8 tensor cores for MMQ (legacy quants)

* fix out-of-bounds writes

* __builtin_assume -> GGML_CUDA_ASSUME

* fix writeback returning too early
2024-06-10 11:45:13 +02:00
Ben Ashbaugh af4ae502dd
use the correct SYCL context for host USM allocations (#7777)
Signed-off-by: Ben Ashbaugh <ben.ashbaugh@intel.com>
2024-06-10 10:21:31 +01:00
Georgi Gerganov 10ceba354a
flake.lock: Update (#7838)
Flake lock file updates:

• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/ad57eef4ef0659193044870c731987a6df5cf56b?narHash=sha256-SzDKxseEcHR5KzPXLwsemyTR/kaM9whxeiJohbL04rs%3D' (2024-05-29)
  → 'github:NixOS/nixpkgs/051f920625ab5aabe37c920346e3e69d7d34400e?narHash=sha256-4q0s6m0GUcN7q%2BY2DqD27iLvbcd1G50T2lv08kKxkSI%3D' (2024-06-07)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2024-06-09 16:04:50 -07:00
Georgi Gerganov e95beeb1fc
imatrix : handle partial entries (#7833) 2024-06-09 20:19:35 +03:00
Nicolás Pérez 57bf62ce7c
docs: Added initial PR template with directions for doc only changes and squash merges [no ci] (#7700)
This commit adds pull_request_template.md and CONTRIBUTING.md . It focuses on explaining to contributors the need to rate PR complexity level, when to add [no ci] and how to format PR title and descriptions.

Co-authored-by: Brian <mofosyne@gmail.com>
Co-authored-by: compilade <git@compilade.net>
2024-06-10 01:24:29 +10:00
mgroeber9110 3e2ee44315
server: do not remove whitespace at the start of a completion chunk (#7830) 2024-06-09 20:50:35 +10:00
Johannes Gäßler 42b53d192f
CUDA: revise q8_1 data layout for mul_mat_q (#7824) 2024-06-09 09:42:25 +02:00
sasha0552 2decf57bc6
convert-hf : set the model name based on cli arg, if present (#7693)
`--model-name` argument was added a while ago but did not do anything.
This commit fixes this issue and enables this feature.
2024-06-09 16:39:25 +10:00
compilade 5795b94182
convert-hf : match model part name prefix and suffix (#7687)
In #7075, to fix the conversion of (some) models using model-00001-of-00001.safetensors instead of model.safetensors for a single model part we simply used the same logic as the part count to get the part names. 

But this doesn't always work correctly, like when unusual additional model files like consolidated.safetensors in https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3 are present.

This commit matching both the prefix and the suffix of the model part names should fix this problem without breaking any previously-supported upstream models. But according to report by @teleprint-me there is still some
persistent problem, but shall do in the meantime.
2024-06-09 12:47:25 +10:00
compilade ed9f252118
gguf-py : decouple adding metadata from writing in GGUFWriter (#7827)
Main changes of this PR is to consolidate GGUFWriter.add_key and GGUFWriter.add_val into GGUFWriter.add_key_value. 

In addition use_temp_file is now opt-in instead of opt-out defaulting to False.

Also GGUFWriter now does not require output file name until when actually writing to it.

And GGUFWriter doesn't really need to eagerly prepare the data layout of the metadata
2024-06-09 12:34:29 +10:00
slaren fe1e3917cf
Revert "[SYCL] Update rpc-server.cpp to include SYCL backend (#7682)" (#7808)
This reverts commit 9422c5e34b.
2024-06-09 01:43:39 +02:00
Olivier Chafik d4d915d351
url: save -mu downloads to new cache location (#7826)
* url: save -mu download to new cache location

* url: fs_get_cache_file_path util

* url: tweak sig of fs_get_cache_file
2024-06-08 21:21:08 +02:00
sasha0552 7a16ce7db2
server : smart slot selection using Longest Common Prefix (#7728)
* server : Smart selection of available slot using Longest Common Substring

* add usage

* remove trailing whitespaces

* Use Longest Common Prefix (LCP) instead of LCS

* Rename argument
2024-06-08 10:50:31 +03:00
slaren da799b4189
vulkan : reuse parent extra for views (#7806)
* vulkan : reuse parent extra for views

* Fix validation error when multiple compute contexts are used in a graph

---------

Co-authored-by: 0cc4m <picard12@live.de>
2024-06-07 19:47:49 +02:00
ngxson a710df749c (wip) refactor 2024-06-07 15:37:58 +02:00