ngxson
f99be2c3ff
disable GPU for PCA
2024-06-13 15:21:49 +02:00
ngxson
91f7dbfda2
typo
2024-06-13 14:55:26 +02:00
ngxson
64cad20c2e
change compile target to llama-cvector-generator
2024-06-13 14:51:11 +02:00
ngxson
2f055584cf
Merge branch 'master' into xsn/control-vector-generator
2024-06-13 14:33:45 +02:00
ngxson
ca86d4fd33
escape prompt by default
2024-06-13 13:29:58 +02:00
slaren
f578b86b21
move BLAS to a separate backend ( #6210 )
...
* move BLAS to a separate backend
* rename GGML_USE_OPENBLAS to GGML_USE_BLAS
* alloc : reuse same buffer when the same buffer type if used multiple times
* set number of threads automatically for openblas and blis
* sched : print assignments when GGML_SCHED_DEBUG env variable is set
* sched : allow ops with weights on an incompatible buffer type
This will cause the weight to be copied to a backend that supports the
op, which is very costly. The weight should have been stored in a buffer
of a backend that can run the op, but llama.cpp cannot do this
automatically at the moment.
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-06-13 03:11:35 +02:00
Olivier Chafik
1c641e6aac
`build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... ( #7809 )
...
* `main`/`server`: rename to `llama` / `llama-server` for consistency w/ homebrew
* server: update refs -> llama-server
gitignore llama-server
* server: simplify nix package
* main: update refs -> llama
fix examples/main ref
* main/server: fix targets
* update more names
* Update build.yml
* rm accidentally checked in bins
* update straggling refs
* Update .gitignore
* Update server-llm.sh
* main: target name -> llama-cli
* Prefix all example bins w/ llama-
* fix main refs
* rename {main->llama}-cmake-pkg binary
* prefix more cmake targets w/ llama-
* add/fix gbnf-validator subfolder to cmake
* sort cmake example subdirs
* rm bin files
* fix llama-lookup-* Makefile rules
* gitignore /llama-*
* rename Dockerfiles
* rename llama|main -> llama-cli; consistent RPM bin prefixes
* fix some missing -cli suffixes
* rename dockerfile w/ llama-cli
* rename(make): llama-baby-llama
* update dockerfile refs
* more llama-cli(.exe)
* fix test-eval-callback
* rename: llama-cli-cmake-pkg(.exe)
* address gbnf-validator unused fread warning (switched to C++ / ifstream)
* add two missing llama- prefixes
* Updating docs for eval-callback binary to use new `llama-` prefix.
* Updating a few lingering doc references for rename of main to llama-cli
* Updating `run-with-preset.py` to use new binary names.
Updating docs around `perplexity` binary rename.
* Updating documentation references for lookup-merge and export-lora
* Updating two small `main` references missed earlier in the finetune docs.
* Update apps.nix
* update grammar/README.md w/ new llama-* names
* update llama-rpc-server bin name + doc
* Revert "update llama-rpc-server bin name + doc"
This reverts commit e474ef1df4 .
* add hot topic notice to README.md
* Update README.md
* Update README.md
* rename gguf-split & quantize bins refs in **/tests.sh
---------
Co-authored-by: HanClinto <hanclinto@gmail.com>
2024-06-13 00:41:52 +01:00
ngxson
c59bfa6368
add print_usage
2024-06-12 17:12:02 +02:00
ngxson
b22c8459ff
clean up a bit
2024-06-12 16:08:27 +02:00
ngxson
a2a5f1bfbd
better error handling
2024-06-12 16:01:00 +02:00
ngxson
679f5137f8
move param parser to common
2024-06-12 15:58:20 +02:00
Georgi Gerganov
704a35b183
server : restore numeric prompts ( #7883 )
2024-06-12 14:42:29 +03:00
ngxson
f54cb8e307
reuse allocr
2024-06-12 12:53:17 +02:00
ngxson
8ee0c96688
fix compile warn
2024-06-12 12:50:29 +02:00
ngxson
e683b9af60
attemp to fix compile problem on mac
2024-06-12 12:49:01 +02:00
ngxson
7297817d13
use ggml_backend_tensor_copy
2024-06-12 11:41:37 +02:00
ngxson
e9cb3b336d
fix .editorconfig
2024-06-11 22:09:14 +02:00
ngxson
5ffba9ecc3
add readme
2024-06-11 19:35:17 +02:00
ngxson
04c91d29ff
use ggml_format_name
2024-06-11 19:14:04 +02:00
ngxson
54f77e2467
add to makefile all targets
2024-06-11 19:03:13 +02:00
ngxson
85db22dd20
Merge branch 'master' into xsn/control-vector-generator
2024-06-11 19:00:19 +02:00
ngxson
da6babdf0a
fix macos build
2024-06-11 15:47:35 +02:00
ngxson
3223133cf5
default n_pca_batch to 20
2024-06-11 15:05:06 +02:00
Johannes Gäßler
148995e5e5
llama-bench: more compact markdown tables ( #7879 )
2024-06-11 14:45:40 +02:00
ngxson
d41c719980
bring back n_completions
2024-06-11 14:31:45 +02:00
Christian Zhou-Zheng
446da906d9
fix n_completions
2024-06-11 08:22:38 -04:00
ngxson
163916864c
remember to copy back the last_eigenvector
2024-06-11 12:40:07 +02:00
ngxson
1a088fb0a5
working version
2024-06-11 12:37:05 +02:00
ngxson
9e39571fc2
add n_batch for pca
2024-06-11 11:45:16 +02:00
Olivier Chafik
b61eb9644d
json: refine constraint for whitespace to avoid runaways yet allow pretty print ( #7866 )
2024-06-11 02:22:57 +01:00
Olivier Chafik
396b18dfec
`json`: document schema conversion in GBNF readme, align manual grammar examples & converters ( #7841 )
...
* json: fix char pattern in grammar converters
* json: prevent number precision & whitespace runaways in example grammars
* json: add doc to grammar readme
2024-06-11 01:00:30 +01:00
ngxson
6a5adf3d7c
fix shape of v_diff_original
2024-06-11 01:33:16 +02:00
ngxson
c241b500a1
clean up PCA ggml implementation
2024-06-11 01:13:10 +02:00
Georgi Gerganov
c28a83902c
examples : remove --instruct remnants ( #7846 )
2024-06-10 15:00:15 +03:00
Georgi Gerganov
d9da0e4986
server : improve "prompt" handling ( #7847 )
2024-06-10 14:59:55 +03:00
Georgi Gerganov
e95beeb1fc
imatrix : handle partial entries ( #7833 )
2024-06-09 20:19:35 +03:00
mgroeber9110
3e2ee44315
server: do not remove whitespace at the start of a completion chunk ( #7830 )
2024-06-09 20:50:35 +10:00
slaren
fe1e3917cf
Revert "[SYCL] Update rpc-server.cpp to include SYCL backend ( #7682 )" ( #7808 )
...
This reverts commit 9422c5e34b .
2024-06-09 01:43:39 +02:00
sasha0552
7a16ce7db2
server : smart slot selection using Longest Common Prefix ( #7728 )
...
* server : Smart selection of available slot using Longest Common Substring
* add usage
* remove trailing whitespaces
* Use Longest Common Prefix (LCP) instead of LCS
* Rename argument
2024-06-08 10:50:31 +03:00
ngxson
a710df749c
(wip) refactor
2024-06-07 15:37:58 +02:00
Christian Zhou-Zheng
c00fad71e5
gguf-split : change binary multi-byte units to decimal ( #7803 )
2024-06-07 15:56:01 +03:00
Johannes Gäßler
7027b27d76
server: update cache_prompt documentation [no ci] ( #7745 )
2024-06-07 11:15:49 +02:00
woodx
a5cabd7649
server : do not get prompt in infill mode ( #7286 )
...
* avoid to get prompt in infill mode and embedding mode
* remove embedding mode
* refactor format
---------
Co-authored-by: wudexiang <wudexiang@bytedance.com>
2024-06-07 10:09:45 +03:00
slaren
c9ee7118d5
check for nans in imatrix and quantize ( #7807 )
...
* imatrix : detect nan/inf values
* quantize : check imatrix for nan/inf values
2024-06-07 09:01:29 +03:00
Georgi Gerganov
f83351f9a6
imatrix : migrate to gpt_params ( #7771 )
...
* imatrix : migrate to gpt_params
ggml-ci
* imatrix : add --save-frequency cli arg
* common : fix --no-ppl
2024-06-06 16:30:58 +03:00
Olivier Chafik
55b2d0849d
grammars: x{min,max} repetition operator ( #6640 )
...
* grammars: x{min,max} repetition operator + tweak +/*/? to avoid duplication of original over alternates
* grammars: handle `x{n}` and fix `x{n,n}`
* grammars: document new repetition operators
* grammars: uniform use of int for min & max
* grammars: refactor parser test
* grammar: parsing tests w/ natural pretty print of updated expectations
* grammars: much prettier print of expectations (+ TEST_GRAMMAR_PARSER_PRINT_ALL=1 to force all)
* grammars: improve test pretty print again
* grammars: pretty print rules and chars
* grammars: fix copy rule skipping
* grammars: disallow `a{,}` (not allowed in regexps)
* Update common/grammar-parser.cpp
Co-authored-by: Clint Herron <hanclinto@gmail.com>
* grammars: fix copy rule skipping (again) & display of expectations
* grammars: more test cases
* grammars: update reps parsing to bring ? / * / + closer to before
* json: use new GBNF repetitions{m,n} syntax
* grammars: update performance gotchas w/ repetition advice
* Update examples/json_schema_to_grammar.py
Co-authored-by: Clint Herron <hanclinto@gmail.com>
* Update examples/server/public/json-schema-to-grammar.mjs
Co-authored-by: Clint Herron <hanclinto@gmail.com>
* grammars: comment on rule repetitions
* grammars: ensure unambiguous number alternatives
* grammar: nit typo switched error msgs
* grammar: nit numbering in comment
* json: update numeric rule to be unambiguous
* Apply suggestions from code review
Co-authored-by: Clint Herron <hanclinto@gmail.com>
* Update examples/server/public/json-schema-to-grammar.mjs
Co-authored-by: Clint Herron <hanclinto@gmail.com>
* json: fix integral-part
* grammar: add repetition tests
---------
Co-authored-by: Clint Herron <hanclinto@gmail.com>
2024-06-06 10:07:06 +01:00
Georgi Gerganov
2b3389677a
ggml : refactor rope norm/neox ( #7634 )
...
* ggml : unify rope norm/neox (CPU)
* ggml : fix compile warning
* ggml : remove GLM rope mode
ggml-ci
* metal : better rope implementation
ggml-ci
* cuda : better rope implementation
ggml-ci
* naming : n_orig_ctx -> n_ctx_orig
ggml-ci
* dev : add reminders to update backends
ggml-ci
* vulkan : fix ggml_rope_ext() usage
* cuda : fix array size + indents
ggml-ci
2024-06-05 11:29:20 +03:00
arch-btw
9973e81c5c
readme : remove -ins ( #7759 )
...
-ins and --instruct were moved in https://github.com/ggerganov/llama.cpp/pull/7675
I have adjusted the README accordingly.
There was no trace of --chatml in the README.
2024-06-05 09:40:49 +03:00
Georgi Gerganov
1442677f92
common : refactor cli arg parsing ( #7675 )
...
* common : gpt_params_parse do not print usage
* common : rework usage print (wip)
* common : valign
* common : rework print_usage
* infill : remove cfg support
* common : reorder args
* server : deduplicate parameters
ggml-ci
* common : add missing header
ggml-ci
* common : remote --random-prompt usages
ggml-ci
* examples : migrate to gpt_params
ggml-ci
* batched-bench : migrate to gpt_params
* retrieval : migrate to gpt_params
* common : change defaults for escape and n_ctx
* common : remove chatml and instruct params
ggml-ci
* common : passkey use gpt_params
2024-06-04 21:23:39 +03:00
Georgi Gerganov
554c247caf
ggml : remove OpenCL ( #7735 )
...
ggml-ci
2024-06-04 21:23:20 +03:00