Commit Graph

3729 Commits

Author SHA1 Message Date
Farbod Bijary 67155ab7f5
feat: Implements retrying logic for downloading models using --model-url flag (#9255)
* feat: Implements retrying logic for downloading models using --model-url flag

* Update common/common.cpp

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>

* Update common/common.cpp

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>

* apply comments

* implements a retry function to avoid duplication

* fix editorconfig

* change function name

---------

Co-authored-by: farbod <farbod.bjary82@gmail.com>
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: slaren <slarengh@gmail.com>
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2024-09-11 11:22:37 +02:00
Johannes Gäßler 5af118efda
CUDA: fix --split-mode row race condition (#9413) 2024-09-11 10:22:40 +02:00
Georgi Gerganov d2b496bff4
batched-bench : remove unused code (#9305) 2024-09-11 10:03:54 +03:00
R0CKSTAR b34e023480
musa: remove Clang builtins mapping (#9421)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2024-09-11 03:46:55 +02:00
Alberto Cabrera Pérez 51b6038636
sycl : update support conditions (#9394)
* sycl : update support condition to im2col

Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com>

* Added TODO to remind supporting FP32 im2col

---------

Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com>
2024-09-11 08:53:42 +08:00
Georgi Gerganov cb9c933eb2
flake.lock: Update (#9360)
Flake lock file updates:

• Updated input 'flake-parts':
    'github:hercules-ci/flake-parts/af510d4a62d071ea13925ce41c95e3dec816c01d?narHash=sha256-ODYRm8zHfLTH3soTFWE452ydPYz2iTvr9T8ftDMUQ3E%3D' (2024-08-30)
  → 'github:hercules-ci/flake-parts/567b938d64d4b4112ee253b9274472dc3a346eb6?narHash=sha256-%2Bebgonl3NbiKD2UD0x4BszCZQ6sTfL4xioaM49o5B3Y%3D' (2024-09-01)
• Updated input 'flake-parts/nixpkgs-lib':
    'a5d394176e.tar.gz?narHash=sha256-uFf2QeW7eAHlYXuDktm9c25OxOyCoUOQmh5SZ9amE5Q%3D' (2024-08-01)
  → '356624c120.tar.gz?narHash=sha256-Ss8QWLXdr2JCBPcYChJhz4xJm%2Bh/xjl4G0c0XlP6a74%3D' (2024-09-01)
• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/71e91c409d1e654808b2621f28a327acfdad8dc2?narHash=sha256-GnR7/ibgIH1vhoy8cYdmXE6iyZqKqFxQSVkFgosBh6w%3D' (2024-08-28)
  → 'github:NixOS/nixpkgs/574d1eac1c200690e27b8eb4e24887f8df7ac27c?narHash=sha256-v3rIhsJBOMLR8e/RNWxr828tB%2BWywYIoajrZKFM%2B0Gg%3D' (2024-09-06)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2024-09-10 15:46:59 -07:00
Xuan Son Nguyen 6cd4e03444
arg : bring back missing ifdef (#9411)
* arg : bring back missing ifdef

* replace with llama_supports_gpu_offload
2024-09-10 22:41:29 +02:00
matteo 8d300bd35f
enable --special arg for llama-server (#9419)
Co-authored-by: matteo serva <matteo.serva@gmail.com>
2024-09-10 22:40:59 +02:00
slaren 49006c67b4
llama : move random seed generation to the samplers (#9398)
* llama_sampler_penalties : clamp penalty_last_n to zero
2024-09-10 18:04:25 +02:00
Georgi Gerganov 00ba2ff781
metal : fix compile warning with GGML_METAL_NDEBUG (#0) 2024-09-10 10:17:43 +03:00
Daniel Bevenius 83008b7cfe
llama : update llm_build_copy_mask_state comment [no ci] (#9385)
This commit updates the comment, which seems to contain a typo or be an
outdated comment, in the copy_mask_state function changing the variable
n_rs to n_kv.

I believe this change is correct and what the comment wants to
convey is to copy the states that are not going to be used in the
upcoming processing, which are the tokens states from n_seqs up to
the number of possible token states n_kv.
2024-09-10 10:03:21 +03:00
Molly Sophia 0b4ac75772
RWKV v6: Add time_mix_decay_w1/w2 in quant exclusion list (#9387)
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2024-09-10 10:02:30 +03:00
slaren fb3f249815
make : do not run llama-gen-docs when building (#9399) 2024-09-10 09:23:33 +03:00
Xuan Son Nguyen bfe76d4a17
common : move arg parser code to `arg.cpp` (#9388)
* common : move arg parser to arg.cpp

* better categorize args

* add cmake

* missing climits

* missing cstdarg

* common : more explicit includes

* fix build

* refactor gpt_params_parse

* update server readme

* fix test

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-09-09 23:36:09 +02:00
Radoslav Gerganov 293bebe077
rpc : fix segfault with nkvo (#9389)
* rpc : fix nkvo

* rpc : buf_size must not be static

ref: #9337

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-09-09 18:40:10 +03:00
Prashant Vithule 5fac4d5764
ggml : vector length agnostic SVE support (#9290)
* Implemented vector length agnostic SVE using switch case for 512-bit, 256-bit, 128-bit vector lengths

* Implemented vector length agnostic SVE using switch case for 512-bit, 256-bit, 128-bit vector lengths

* Removed WhiteSpaces

* ggml : style changes + fix 512-bit nb loop check

- fix local scope in switch cases
- consistent predicate names
- empty lines when necessary
- opening braces, spaces
- const-correctness
- add asserts

* Update ggml/src/ggml-quants.c

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-09-09 18:37:18 +03:00
slaren 5fb5e24811
llama : minor sampling refactor (2) (#9386) 2024-09-09 17:10:46 +02:00
Georgi Gerganov 38ca6f644b
readme : update hot topics 2024-09-09 15:51:37 +03:00
Johannes Gäßler 8e6e2fbe14
CUDA: fix variable name conflict for Windows build (#9382) 2024-09-09 14:22:53 +02:00
Antonis Makropoulos 5ed087573e
readme : add LLMUnity to UI projects (#9381)
* add LLMUnity to UI projects

* add newline to examples/rpc/README.md to fix editorconfig-checker unit test
2024-09-09 14:21:38 +03:00
Radoslav Gerganov 54f376d0b9
rpc : update README [no ci] (#9320)
Update README with instructions how to offload model layers to both
local and remote devices
2024-09-09 11:04:39 +03:00
Dan Johansson b2e89a3274
Arm AArch64: Documentation updates (#9321)
* Arm AArch64: Documentation updates

* Update docs/build.md to include information on how to enable the Arm optimized gemm/gemv kernels

* Update examples/quantize/README.md with information on the Q4_0_4_4, Q4_0_4_8 and Q4_0_8_8 formats

* Add newline to the end of docs/build.md
2024-09-09 10:02:45 +03:00
Markus Tavenrath daa9623ab0
Overlap cmdbuffer creation and cmdbuffer execution in Vulkan backend by submitting smaller cmdbuffers early. (#9118)
* Overlap cmdbuffer creation and cmdbuffer execution in Vulkan backend by submitting smaller cmdbuffers early.

* fix compile issues

* Fix issues where the last submit wasn't executed or handled properly.

* remove trailing whitespace

* Repair GGML_VULKAN_CHECK_RESULTS

* Increase submit counter only if actual work has been submitted and increase submit count to 100.

* Fix some nodes are not checked with GGML_VULKAN_CHECK_RESULTS enabled.
2024-09-08 21:43:48 +02:00
Georgi Gerganov e079bffb66
cuda : fix FA Q src index (1 -> 0) (#9374) 2024-09-08 22:01:02 +03:00
Xuan Son Nguyen 3f7ccfd649
common : bring back missing args, add env var duplication check (#9375)
* common : bring back missing args

* move duplication check to test-arg-parser

* add check for duplicated env var

* correct default values
2024-09-08 18:08:55 +02:00
slaren a249843d89
common : restore --n-gpu-layers (#9371) 2024-09-08 16:44:42 +02:00
slaren 19f4a7b296
llama : refactor samplers internal implementation (#9370) 2024-09-08 15:52:07 +02:00
Neo Zhang Jianyu 2a358fb0c4
[SYCL] add check malloc result on device (#9346)
* add check malloc result on device

* update for review comments, check all malloc_device() result

---------

Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>
2024-09-08 19:05:29 +08:00
slaren eae597182c
llama : sanitize tokens in the upper bound (#9359) 2024-09-08 12:41:51 +02:00
Xuan Son Nguyen 00b02bb249
imatrix : fix arg parser for imatrix (#9366)
* imatrix : fix arg parser

* beautify printing first arg
2024-09-08 12:12:17 +02:00
Georgi Gerganov a876861455 metal : update support condition for im2col + fix warning (#0) 2024-09-08 11:05:55 +03:00
Georgi Gerganov 385decbd63 sync : ggml 2024-09-08 11:05:55 +03:00
Georgi Gerganov 60a3107ccd scripts : option to increase git patch context 2024-09-08 11:05:55 +03:00
Salvatore Mesoraca 406c1a32a1 vulkan: add dryrun support to sin and cos ops (ggml/947)
sin and cos failed test-backend-ops because they
tried to dereference a context pointer that is null
on dry runs.

This commit prevents that segfault.

Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
2024-09-08 11:05:55 +03:00
Salvatore Mesoraca 9cb9260861 vulkan: correctly report support for OP_CONT (ggml/946)
test-backend-ops fails because ggml_cont aborts
when invoked passing an unsupported type.

This commit makes ggml_cont tests pass

Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
2024-09-08 11:05:55 +03:00
Johannes Gäßler 202084d31d tests: add gradient tests for all backends (ggml/932)
* tests: add gradient checking to test-backend-ops

* remove old comment

* reorder includes

* adjust SIN/COS parameters

* add documentation, use supports_op if possible
2024-09-08 11:05:55 +03:00
Johannes Gäßler dbbebcab33 ggml: fix ggml_graph_cpy undefined behavior (ggml/943) 2024-09-08 11:05:55 +03:00
Georgi Gerganov ba1cf846ed cann : fix doxy (ggml/0) 2024-09-08 11:05:55 +03:00
Mengqing Cao d2d3200b38 cann : add Ascend NPU support (whisper/2336)
* enable Ascend NPU in src/whisper.cpp
  * sync test-backend-ops with llama.cpp
2024-09-08 11:05:55 +03:00
Georgi Gerganov 51d964a4ef cuda : mark BF16 CONT as unsupported 2024-09-08 11:05:55 +03:00
Salvatore Mesoraca efe6a83e30 ggml : fix cont with transposed tensors when one dimension is 1 (ggml/934)
* ggml_cont: fix issue with transposed tensors when one dimension is 1

when using multiple threads, it is not enough
to check for the tensors to be contiguous for
ggml_compute_forward_dup_same_cont to work correctly.
The tensors strides also need to match.

Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>

* Add ggml_cont tests

Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>

* Remove dead code

it isn't possible to reach this code because
all these functions are invoked by ggml_compute_forward_dup
if and only if src0->type != dst->type

Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>

* Make ggml_compute_forward_dup_same_cont work with contiguous tensors

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>

---------

Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-09-08 11:05:55 +03:00
Kevin Gibbons fbb7fcffbc
llama : set attrs of mislabelled EOT/EOM tokens (#9348) 2024-09-08 08:51:00 +03:00
Georgi Gerganov a5b5d9a101
llama.android : fix build (#9350) 2024-09-08 00:33:50 +03:00
Georgi Gerganov f12295b8a9
llama : fix empty ring buffer push (#9358) 2024-09-08 00:33:33 +03:00
Georgi Gerganov faf69d4237
llama : sanitize invalid tokens (#9357)
* common : do not add null tokens during warmup

ggml-ci

* llama : check that the input tokens are valid

ggml-ci

* tests : fix batch size of bert model

ggml-ci
2024-09-08 00:33:13 +03:00
Eve e536426ded
llamafile : disable sgemm for batch-size 1 (#9330) 2024-09-07 22:02:26 +03:00
Xuan Son Nguyen 1b9ae5189c
common : refactor arg parser (#9308)
* (wip) argparser v3

* migrated

* add test

* handle env

* fix linux build

* add export-docs example

* fix build (2)

* skip build test-arg-parser on windows

* update server docs

* bring back missing --alias

* bring back --n-predict

* clarify test-arg-parser

* small correction

* add comments

* fix args with 2 values

* refine example-specific args

* no more lamba capture

Co-authored-by: slaren@users.noreply.github.com

* params.sparams

* optimize more

* export-docs --> gen-docs
2024-09-07 20:43:51 +02:00
slaren e32d0816ed
ggml : always check bounds on get_rows operations (#9354) 2024-09-07 20:23:07 +02:00
Georgi Gerganov df270ef745
llama : refactor sampling v2 (#9294)
- Add `struct llama_sampler` and `struct llama_sampler_i`
- Add `llama_sampler_` API
- Add `llama_sampler_chain_` API for chaining multiple samplers
- Remove `LLAMA_API_INTERNAL`
- Add `llama_perf_` API and remove old `llama_print_timings` and `llama_reset_timings`
2024-09-07 15:16:19 +03:00
Xuan Son Nguyen 947538acb8
ggml : fix missing `cpu_set_t` on emscripten (#9336)
* ggml : fix missing cpu_set_t on emscripten

* better version

* bring back android part
2024-09-07 12:01:34 +02:00