Commit Graph

969 Commits

Author SHA1 Message Date
ngxson f99be2c3ff disable GPU for PCA 2024-06-13 15:21:49 +02:00
ngxson 91f7dbfda2 typo 2024-06-13 14:55:26 +02:00
ngxson 64cad20c2e change compile target to llama-cvector-generator 2024-06-13 14:51:11 +02:00
ngxson 2f055584cf Merge branch 'master' into xsn/control-vector-generator 2024-06-13 14:33:45 +02:00
ngxson ca86d4fd33 escape prompt by default 2024-06-13 13:29:58 +02:00
slaren f578b86b21
move BLAS to a separate backend (#6210)
* move BLAS to a separate backend

* rename GGML_USE_OPENBLAS to GGML_USE_BLAS

* alloc : reuse same buffer when the same buffer type if used multiple times

* set number of threads automatically for openblas and blis

* sched : print assignments when GGML_SCHED_DEBUG env variable is set

* sched : allow ops with weights on an incompatible buffer type

This will cause the weight to be copied to a backend that supports the
op, which is very costly. The weight should have been stored in a buffer
of a backend that can run the op, but llama.cpp cannot do this
automatically at the moment.

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-06-13 03:11:35 +02:00
Olivier Chafik 1c641e6aac
`build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809)
* `main`/`server`: rename to `llama` / `llama-server` for consistency w/ homebrew

* server: update refs -> llama-server

gitignore llama-server

* server: simplify nix package

* main: update refs -> llama

fix examples/main ref

* main/server: fix targets

* update more names

* Update build.yml

* rm accidentally checked in bins

* update straggling refs

* Update .gitignore

* Update server-llm.sh

* main: target name -> llama-cli

* Prefix all example bins w/ llama-

* fix main refs

* rename {main->llama}-cmake-pkg binary

* prefix more cmake targets w/ llama-

* add/fix gbnf-validator subfolder to cmake

* sort cmake example subdirs

* rm bin files

* fix llama-lookup-* Makefile rules

* gitignore /llama-*

* rename Dockerfiles

* rename llama|main -> llama-cli; consistent RPM bin prefixes

* fix some missing -cli suffixes

* rename dockerfile w/ llama-cli

* rename(make): llama-baby-llama

* update dockerfile refs

* more llama-cli(.exe)

* fix test-eval-callback

* rename: llama-cli-cmake-pkg(.exe)

* address gbnf-validator unused fread warning (switched to C++ / ifstream)

* add two missing llama- prefixes

* Updating docs for eval-callback binary to use new `llama-` prefix.

* Updating a few lingering doc references for rename of main to llama-cli

* Updating `run-with-preset.py` to use new binary names.
Updating docs around `perplexity` binary rename.

* Updating documentation references for lookup-merge and export-lora

* Updating two small `main` references missed earlier in the finetune docs.

* Update apps.nix

* update grammar/README.md w/ new llama-* names

* update llama-rpc-server bin name + doc

* Revert "update llama-rpc-server bin name + doc"

This reverts commit e474ef1df4.

* add hot topic notice to README.md

* Update README.md

* Update README.md

* rename gguf-split & quantize bins refs in **/tests.sh

---------

Co-authored-by: HanClinto <hanclinto@gmail.com>
2024-06-13 00:41:52 +01:00
ngxson c59bfa6368 add print_usage 2024-06-12 17:12:02 +02:00
ngxson b22c8459ff clean up a bit 2024-06-12 16:08:27 +02:00
ngxson a2a5f1bfbd better error handling 2024-06-12 16:01:00 +02:00
ngxson 679f5137f8 move param parser to common 2024-06-12 15:58:20 +02:00
Georgi Gerganov 704a35b183
server : restore numeric prompts (#7883) 2024-06-12 14:42:29 +03:00
ngxson f54cb8e307 reuse allocr 2024-06-12 12:53:17 +02:00
ngxson 8ee0c96688 fix compile warn 2024-06-12 12:50:29 +02:00
ngxson e683b9af60 attemp to fix compile problem on mac 2024-06-12 12:49:01 +02:00
ngxson 7297817d13 use ggml_backend_tensor_copy 2024-06-12 11:41:37 +02:00
ngxson e9cb3b336d fix .editorconfig 2024-06-11 22:09:14 +02:00
ngxson 5ffba9ecc3 add readme 2024-06-11 19:35:17 +02:00
ngxson 04c91d29ff use ggml_format_name 2024-06-11 19:14:04 +02:00
ngxson 54f77e2467 add to makefile all targets 2024-06-11 19:03:13 +02:00
ngxson 85db22dd20 Merge branch 'master' into xsn/control-vector-generator 2024-06-11 19:00:19 +02:00
ngxson da6babdf0a fix macos build 2024-06-11 15:47:35 +02:00
ngxson 3223133cf5 default n_pca_batch to 20 2024-06-11 15:05:06 +02:00
Johannes Gäßler 148995e5e5
llama-bench: more compact markdown tables (#7879) 2024-06-11 14:45:40 +02:00
ngxson d41c719980 bring back n_completions 2024-06-11 14:31:45 +02:00
Christian Zhou-Zheng 446da906d9 fix n_completions 2024-06-11 08:22:38 -04:00
ngxson 163916864c remember to copy back the last_eigenvector 2024-06-11 12:40:07 +02:00
ngxson 1a088fb0a5 working version 2024-06-11 12:37:05 +02:00
ngxson 9e39571fc2 add n_batch for pca 2024-06-11 11:45:16 +02:00
Olivier Chafik b61eb9644d
json: refine constraint for whitespace to avoid runaways yet allow pretty print (#7866) 2024-06-11 02:22:57 +01:00
Olivier Chafik 396b18dfec
`json`: document schema conversion in GBNF readme, align manual grammar examples & converters (#7841)
* json: fix char pattern in grammar converters

* json: prevent number precision & whitespace runaways in example grammars

* json: add doc to grammar readme
2024-06-11 01:00:30 +01:00
ngxson 6a5adf3d7c fix shape of v_diff_original 2024-06-11 01:33:16 +02:00
ngxson c241b500a1 clean up PCA ggml implementation 2024-06-11 01:13:10 +02:00
Georgi Gerganov c28a83902c
examples : remove --instruct remnants (#7846) 2024-06-10 15:00:15 +03:00
Georgi Gerganov d9da0e4986
server : improve "prompt" handling (#7847) 2024-06-10 14:59:55 +03:00
Georgi Gerganov e95beeb1fc
imatrix : handle partial entries (#7833) 2024-06-09 20:19:35 +03:00
mgroeber9110 3e2ee44315
server: do not remove whitespace at the start of a completion chunk (#7830) 2024-06-09 20:50:35 +10:00
slaren fe1e3917cf
Revert "[SYCL] Update rpc-server.cpp to include SYCL backend (#7682)" (#7808)
This reverts commit 9422c5e34b.
2024-06-09 01:43:39 +02:00
sasha0552 7a16ce7db2
server : smart slot selection using Longest Common Prefix (#7728)
* server : Smart selection of available slot using Longest Common Substring

* add usage

* remove trailing whitespaces

* Use Longest Common Prefix (LCP) instead of LCS

* Rename argument
2024-06-08 10:50:31 +03:00
ngxson a710df749c (wip) refactor 2024-06-07 15:37:58 +02:00
Christian Zhou-Zheng c00fad71e5
gguf-split : change binary multi-byte units to decimal (#7803) 2024-06-07 15:56:01 +03:00
Johannes Gäßler 7027b27d76
server: update cache_prompt documentation [no ci] (#7745) 2024-06-07 11:15:49 +02:00
woodx a5cabd7649
server : do not get prompt in infill mode (#7286)
* avoid to get prompt in infill mode and embedding mode

* remove embedding mode

* refactor format

---------

Co-authored-by: wudexiang <wudexiang@bytedance.com>
2024-06-07 10:09:45 +03:00
slaren c9ee7118d5
check for nans in imatrix and quantize (#7807)
* imatrix : detect nan/inf values

* quantize : check imatrix for nan/inf values
2024-06-07 09:01:29 +03:00
Georgi Gerganov f83351f9a6
imatrix : migrate to gpt_params (#7771)
* imatrix : migrate to gpt_params

ggml-ci

* imatrix : add --save-frequency cli arg

* common : fix --no-ppl
2024-06-06 16:30:58 +03:00
Olivier Chafik 55b2d0849d
grammars: x{min,max} repetition operator (#6640)
* grammars: x{min,max} repetition operator + tweak +/*/? to avoid duplication of original over alternates

* grammars: handle `x{n}` and fix `x{n,n}`

* grammars: document new repetition operators

* grammars: uniform use of int for min & max

* grammars: refactor parser test

* grammar: parsing tests w/ natural pretty print of updated expectations

* grammars: much prettier print of expectations (+ TEST_GRAMMAR_PARSER_PRINT_ALL=1 to force all)

* grammars: improve test pretty print again

* grammars: pretty print rules and chars

* grammars: fix copy rule skipping

* grammars: disallow `a{,}` (not allowed in regexps)

* Update common/grammar-parser.cpp

Co-authored-by: Clint Herron <hanclinto@gmail.com>

* grammars: fix copy rule skipping (again) & display of expectations

* grammars: more test cases

* grammars: update reps parsing to bring ? / * / + closer to before

* json: use new GBNF repetitions{m,n} syntax

* grammars: update performance gotchas w/ repetition advice

* Update examples/json_schema_to_grammar.py

Co-authored-by: Clint Herron <hanclinto@gmail.com>

* Update examples/server/public/json-schema-to-grammar.mjs

Co-authored-by: Clint Herron <hanclinto@gmail.com>

* grammars: comment on rule repetitions

* grammars: ensure unambiguous number alternatives

* grammar: nit typo switched error msgs

* grammar: nit numbering in comment

* json: update numeric rule to be unambiguous

* Apply suggestions from code review

Co-authored-by: Clint Herron <hanclinto@gmail.com>

* Update examples/server/public/json-schema-to-grammar.mjs

Co-authored-by: Clint Herron <hanclinto@gmail.com>

* json: fix integral-part

* grammar: add repetition tests

---------

Co-authored-by: Clint Herron <hanclinto@gmail.com>
2024-06-06 10:07:06 +01:00
Georgi Gerganov 2b3389677a
ggml : refactor rope norm/neox (#7634)
* ggml : unify rope norm/neox (CPU)

* ggml : fix compile warning

* ggml : remove GLM rope mode

ggml-ci

* metal : better rope implementation

ggml-ci

* cuda : better rope implementation

ggml-ci

* naming : n_orig_ctx -> n_ctx_orig

ggml-ci

* dev : add reminders to update backends

ggml-ci

* vulkan : fix ggml_rope_ext() usage

* cuda : fix array size + indents

ggml-ci
2024-06-05 11:29:20 +03:00
arch-btw 9973e81c5c
readme : remove -ins (#7759)
-ins and --instruct were moved in https://github.com/ggerganov/llama.cpp/pull/7675

I have adjusted the README accordingly.
There was no trace of --chatml in the README.
2024-06-05 09:40:49 +03:00
Georgi Gerganov 1442677f92
common : refactor cli arg parsing (#7675)
* common : gpt_params_parse do not print usage

* common : rework usage print (wip)

* common : valign

* common : rework print_usage

* infill : remove cfg support

* common : reorder args

* server : deduplicate parameters

ggml-ci

* common : add missing header

ggml-ci

* common : remote --random-prompt usages

ggml-ci

* examples : migrate to gpt_params

ggml-ci

* batched-bench : migrate to gpt_params

* retrieval : migrate to gpt_params

* common : change defaults for escape and n_ctx

* common : remove chatml and instruct params

ggml-ci

* common : passkey use gpt_params
2024-06-04 21:23:39 +03:00
Georgi Gerganov 554c247caf
ggml : remove OpenCL (#7735)
ggml-ci
2024-06-04 21:23:20 +03:00