Commit Graph

957 Commits

Author SHA1 Message Date
ngxson f54cb8e307 reuse allocr 2024-06-12 12:53:17 +02:00
ngxson 8ee0c96688 fix compile warn 2024-06-12 12:50:29 +02:00
ngxson e683b9af60 attemp to fix compile problem on mac 2024-06-12 12:49:01 +02:00
ngxson 7297817d13 use ggml_backend_tensor_copy 2024-06-12 11:41:37 +02:00
ngxson e9cb3b336d fix .editorconfig 2024-06-11 22:09:14 +02:00
ngxson 5ffba9ecc3 add readme 2024-06-11 19:35:17 +02:00
ngxson 04c91d29ff use ggml_format_name 2024-06-11 19:14:04 +02:00
ngxson 54f77e2467 add to makefile all targets 2024-06-11 19:03:13 +02:00
ngxson 85db22dd20 Merge branch 'master' into xsn/control-vector-generator 2024-06-11 19:00:19 +02:00
ngxson da6babdf0a fix macos build 2024-06-11 15:47:35 +02:00
ngxson 3223133cf5 default n_pca_batch to 20 2024-06-11 15:05:06 +02:00
Johannes Gäßler 148995e5e5
llama-bench: more compact markdown tables (#7879) 2024-06-11 14:45:40 +02:00
ngxson d41c719980 bring back n_completions 2024-06-11 14:31:45 +02:00
Christian Zhou-Zheng 446da906d9 fix n_completions 2024-06-11 08:22:38 -04:00
ngxson 163916864c remember to copy back the last_eigenvector 2024-06-11 12:40:07 +02:00
ngxson 1a088fb0a5 working version 2024-06-11 12:37:05 +02:00
ngxson 9e39571fc2 add n_batch for pca 2024-06-11 11:45:16 +02:00
Olivier Chafik b61eb9644d
json: refine constraint for whitespace to avoid runaways yet allow pretty print (#7866) 2024-06-11 02:22:57 +01:00
Olivier Chafik 396b18dfec
`json`: document schema conversion in GBNF readme, align manual grammar examples & converters (#7841)
* json: fix char pattern in grammar converters

* json: prevent number precision & whitespace runaways in example grammars

* json: add doc to grammar readme
2024-06-11 01:00:30 +01:00
ngxson 6a5adf3d7c fix shape of v_diff_original 2024-06-11 01:33:16 +02:00
ngxson c241b500a1 clean up PCA ggml implementation 2024-06-11 01:13:10 +02:00
Georgi Gerganov c28a83902c
examples : remove --instruct remnants (#7846) 2024-06-10 15:00:15 +03:00
Georgi Gerganov d9da0e4986
server : improve "prompt" handling (#7847) 2024-06-10 14:59:55 +03:00
Georgi Gerganov e95beeb1fc
imatrix : handle partial entries (#7833) 2024-06-09 20:19:35 +03:00
mgroeber9110 3e2ee44315
server: do not remove whitespace at the start of a completion chunk (#7830) 2024-06-09 20:50:35 +10:00
slaren fe1e3917cf
Revert "[SYCL] Update rpc-server.cpp to include SYCL backend (#7682)" (#7808)
This reverts commit 9422c5e34b.
2024-06-09 01:43:39 +02:00
sasha0552 7a16ce7db2
server : smart slot selection using Longest Common Prefix (#7728)
* server : Smart selection of available slot using Longest Common Substring

* add usage

* remove trailing whitespaces

* Use Longest Common Prefix (LCP) instead of LCS

* Rename argument
2024-06-08 10:50:31 +03:00
ngxson a710df749c (wip) refactor 2024-06-07 15:37:58 +02:00
Christian Zhou-Zheng c00fad71e5
gguf-split : change binary multi-byte units to decimal (#7803) 2024-06-07 15:56:01 +03:00
Johannes Gäßler 7027b27d76
server: update cache_prompt documentation [no ci] (#7745) 2024-06-07 11:15:49 +02:00
woodx a5cabd7649
server : do not get prompt in infill mode (#7286)
* avoid to get prompt in infill mode and embedding mode

* remove embedding mode

* refactor format

---------

Co-authored-by: wudexiang <wudexiang@bytedance.com>
2024-06-07 10:09:45 +03:00
slaren c9ee7118d5
check for nans in imatrix and quantize (#7807)
* imatrix : detect nan/inf values

* quantize : check imatrix for nan/inf values
2024-06-07 09:01:29 +03:00
Georgi Gerganov f83351f9a6
imatrix : migrate to gpt_params (#7771)
* imatrix : migrate to gpt_params

ggml-ci

* imatrix : add --save-frequency cli arg

* common : fix --no-ppl
2024-06-06 16:30:58 +03:00
Olivier Chafik 55b2d0849d
grammars: x{min,max} repetition operator (#6640)
* grammars: x{min,max} repetition operator + tweak +/*/? to avoid duplication of original over alternates

* grammars: handle `x{n}` and fix `x{n,n}`

* grammars: document new repetition operators

* grammars: uniform use of int for min & max

* grammars: refactor parser test

* grammar: parsing tests w/ natural pretty print of updated expectations

* grammars: much prettier print of expectations (+ TEST_GRAMMAR_PARSER_PRINT_ALL=1 to force all)

* grammars: improve test pretty print again

* grammars: pretty print rules and chars

* grammars: fix copy rule skipping

* grammars: disallow `a{,}` (not allowed in regexps)

* Update common/grammar-parser.cpp

Co-authored-by: Clint Herron <hanclinto@gmail.com>

* grammars: fix copy rule skipping (again) & display of expectations

* grammars: more test cases

* grammars: update reps parsing to bring ? / * / + closer to before

* json: use new GBNF repetitions{m,n} syntax

* grammars: update performance gotchas w/ repetition advice

* Update examples/json_schema_to_grammar.py

Co-authored-by: Clint Herron <hanclinto@gmail.com>

* Update examples/server/public/json-schema-to-grammar.mjs

Co-authored-by: Clint Herron <hanclinto@gmail.com>

* grammars: comment on rule repetitions

* grammars: ensure unambiguous number alternatives

* grammar: nit typo switched error msgs

* grammar: nit numbering in comment

* json: update numeric rule to be unambiguous

* Apply suggestions from code review

Co-authored-by: Clint Herron <hanclinto@gmail.com>

* Update examples/server/public/json-schema-to-grammar.mjs

Co-authored-by: Clint Herron <hanclinto@gmail.com>

* json: fix integral-part

* grammar: add repetition tests

---------

Co-authored-by: Clint Herron <hanclinto@gmail.com>
2024-06-06 10:07:06 +01:00
Georgi Gerganov 2b3389677a
ggml : refactor rope norm/neox (#7634)
* ggml : unify rope norm/neox (CPU)

* ggml : fix compile warning

* ggml : remove GLM rope mode

ggml-ci

* metal : better rope implementation

ggml-ci

* cuda : better rope implementation

ggml-ci

* naming : n_orig_ctx -> n_ctx_orig

ggml-ci

* dev : add reminders to update backends

ggml-ci

* vulkan : fix ggml_rope_ext() usage

* cuda : fix array size + indents

ggml-ci
2024-06-05 11:29:20 +03:00
arch-btw 9973e81c5c
readme : remove -ins (#7759)
-ins and --instruct were moved in https://github.com/ggerganov/llama.cpp/pull/7675

I have adjusted the README accordingly.
There was no trace of --chatml in the README.
2024-06-05 09:40:49 +03:00
Georgi Gerganov 1442677f92
common : refactor cli arg parsing (#7675)
* common : gpt_params_parse do not print usage

* common : rework usage print (wip)

* common : valign

* common : rework print_usage

* infill : remove cfg support

* common : reorder args

* server : deduplicate parameters

ggml-ci

* common : add missing header

ggml-ci

* common : remote --random-prompt usages

ggml-ci

* examples : migrate to gpt_params

ggml-ci

* batched-bench : migrate to gpt_params

* retrieval : migrate to gpt_params

* common : change defaults for escape and n_ctx

* common : remove chatml and instruct params

ggml-ci

* common : passkey use gpt_params
2024-06-04 21:23:39 +03:00
Georgi Gerganov 554c247caf
ggml : remove OpenCL (#7735)
ggml-ci
2024-06-04 21:23:20 +03:00
Georgi Gerganov 0cd6bd3483
llama : remove beam search (#7736) 2024-06-04 21:23:05 +03:00
slaren adc9ff3841
llama-bench : allow using a different printer for stderr with -oe (#7722)
compare-commits.sh : hide stdout, use -oe to print markdown
2024-06-04 14:32:42 +02:00
Christian Zhou-Zheng a42e783d75 update comments 2024-06-03 21:33:46 -04:00
Christian Zhou-Zheng 3815a0c306 pre-tokenize so we can allocate correct memory to ctx_diffs_wrapped 2024-06-03 21:26:13 -04:00
Christian Zhou-Zheng 23fd1b587c update debug statements 2024-06-03 21:14:43 -04:00
Christian Zhou-Zheng 07dba13ab6 temporary commit while I move dev environments
it finally outputs a functioning control vector - "functioning" in the sense that it can be loaded and it clearly has the right idea, but makes the model incoherent
2024-06-03 17:40:19 -04:00
nickp27 9422c5e34b
[SYCL] Update rpc-server.cpp to include SYCL backend (#7682)
* Update rpc-server.cpp to include SYCL backend

Draft PR to address inclusion of SYCL backend for RPC server

* Update rpc-server.cpp
2024-06-02 12:13:54 +03:00
ngxson 15d5c257a0 fix cb_eval 2024-06-02 10:58:11 +02:00
Christian Zhou-Zheng a23c72e4c0 fix ggml errors and make new ones
at least it compiles and runs
2024-06-01 22:19:33 -04:00
Christian Zhou-Zheng b67ea65983 tentatively translate the rest 2024-06-01 20:47:28 -04:00
Christian Zhou-Zheng 0e1f9734de translated everything but PCA (I think) 2024-06-01 19:50:46 -04:00
Christian Zhou-Zheng df623fffe8 interim fix memory leak 2024-06-01 18:36:54 -04:00