Commit Graph

3131 Commits

Author SHA1 Message Date
Olivier Chafik efaa441233 fix llama-lookup-* Makefile rules 2024-06-08 14:26:11 +01:00
Olivier Chafik b0eb3b88e9 rm bin files 2024-06-08 14:16:32 +01:00
Olivier Chafik eef922e02e sort cmake example subdirs 2024-06-08 14:09:28 +01:00
Olivier Chafik b648243496 add/fix gbnf-validator subfolder to cmake 2024-06-08 14:07:56 +01:00
Olivier Chafik 81222f02db prefix more cmake targets w/ llama- 2024-06-08 14:05:34 +01:00
Olivier Chafik 10650b692d rename {main->llama}-cmake-pkg binary 2024-06-08 13:57:06 +01:00
Olivier Chafik 78bca8cb07 fix main refs 2024-06-08 13:52:03 +01:00
Olivier Chafik ab5efbb3b6 Prefix all example bins w/ llama- 2024-06-08 13:42:01 +01:00
Olivier Chafik 23d0df5bd5 main: target name -> llama-cli 2024-06-08 12:50:35 +01:00
Olivier Chafik fe93cc96cc Merge remote-tracking branch 'origin/master' into bins 2024-06-08 12:04:52 +01:00
sasha0552 7a16ce7db2
server : smart slot selection using Longest Common Prefix (#7728)
* server : Smart selection of available slot using Longest Common Substring

* add usage

* remove trailing whitespaces

* Use Longest Common Prefix (LCP) instead of LCS

* Rename argument
2024-06-08 10:50:31 +03:00
slaren da799b4189
vulkan : reuse parent extra for views (#7806)
* vulkan : reuse parent extra for views

* Fix validation error when multiple compute contexts are used in a graph

---------

Co-authored-by: 0cc4m <picard12@live.de>
2024-06-07 19:47:49 +02:00
Christian Zhou-Zheng c00fad71e5
gguf-split : change binary multi-byte units to decimal (#7803) 2024-06-07 15:56:01 +03:00
intelmatt 27615f5ab2
cmake : fix BUILD_SHARED_LIBS=ON build (#7784)
common depends on pthreads in Linux
2024-06-07 15:15:07 +03:00
Olivier Chafik 0dba58269f Update server-llm.sh 2024-06-07 11:52:40 +01:00
Johannes Gäßler 7027b27d76
server: update cache_prompt documentation [no ci] (#7745) 2024-06-07 11:15:49 +02:00
ochafik af8f0169da Update .gitignore 2024-06-07 10:14:03 +01:00
ochafik 7fbe6006c9 update straggling refs 2024-06-07 09:42:21 +01:00
ochafik 99df4cc091 rm accidentally checked in bins 2024-06-07 09:40:09 +01:00
woodx a5cabd7649
server : do not get prompt in infill mode (#7286)
* avoid to get prompt in infill mode and embedding mode

* remove embedding mode

* refactor format

---------

Co-authored-by: wudexiang <wudexiang@bytedance.com>
2024-06-07 10:09:45 +03:00
pengxin99 d5c938cd77
[SYCL] fix softmax r2r result wrong issue (#7811) 2024-06-07 14:28:26 +08:00
slaren c9ee7118d5
check for nans in imatrix and quantize (#7807)
* imatrix : detect nan/inf values

* quantize : check imatrix for nan/inf values
2024-06-07 09:01:29 +03:00
ochafik fbd83131f5 Merge remote-tracking branch 'origin/master' into bins 2024-06-07 00:51:31 +01:00
ochafik a0a7f2b031 Update build.yml 2024-06-07 00:38:05 +01:00
ochafik 8695baebc0 update more names 2024-06-07 00:21:01 +01:00
Georgi Gerganov ee459f40f6
server : fix --threads-http arg (#7801) 2024-06-06 19:19:59 +03:00
Olivier Chafik 9a03341094 main/server: fix targets 2024-06-06 15:53:25 +01:00
Olivier Chafik 8b7c734473 main: update refs -> llama
fix examples/main ref
2024-06-06 15:44:51 +01:00
Olivier Chafik f5f19a236f server: simplify nix package 2024-06-06 15:44:40 +01:00
Olivier Chafik f298cc63d2 server: update refs -> llama-server
gitignore llama-server
2024-06-06 15:44:40 +01:00
Olivier Chafik 849842916d `main`/`server`: rename to `llama` / `llama-server` for consistency w/ homebrew 2024-06-06 15:28:27 +01:00
Georgi Gerganov f83351f9a6
imatrix : migrate to gpt_params (#7771)
* imatrix : migrate to gpt_params

ggml-ci

* imatrix : add --save-frequency cli arg

* common : fix --no-ppl
2024-06-06 16:30:58 +03:00
Clint Herron ad675e1c67
Added support for . (any character) token in grammar engine. (#6467)
* Added support for . (any characer) token in grammar engine.

* Add integration tests for any-character symbol.
2024-06-06 06:08:52 -07:00
Mattheus Chediak a143c04375
README minor fixes (#7798) [no ci]
derievatives --> derivatives
2024-06-06 22:17:54 +10:00
Olivier Chafik 55b2d0849d
grammars: x{min,max} repetition operator (#6640)
* grammars: x{min,max} repetition operator + tweak +/*/? to avoid duplication of original over alternates

* grammars: handle `x{n}` and fix `x{n,n}`

* grammars: document new repetition operators

* grammars: uniform use of int for min & max

* grammars: refactor parser test

* grammar: parsing tests w/ natural pretty print of updated expectations

* grammars: much prettier print of expectations (+ TEST_GRAMMAR_PARSER_PRINT_ALL=1 to force all)

* grammars: improve test pretty print again

* grammars: pretty print rules and chars

* grammars: fix copy rule skipping

* grammars: disallow `a{,}` (not allowed in regexps)

* Update common/grammar-parser.cpp

Co-authored-by: Clint Herron <hanclinto@gmail.com>

* grammars: fix copy rule skipping (again) & display of expectations

* grammars: more test cases

* grammars: update reps parsing to bring ? / * / + closer to before

* json: use new GBNF repetitions{m,n} syntax

* grammars: update performance gotchas w/ repetition advice

* Update examples/json_schema_to_grammar.py

Co-authored-by: Clint Herron <hanclinto@gmail.com>

* Update examples/server/public/json-schema-to-grammar.mjs

Co-authored-by: Clint Herron <hanclinto@gmail.com>

* grammars: comment on rule repetitions

* grammars: ensure unambiguous number alternatives

* grammar: nit typo switched error msgs

* grammar: nit numbering in comment

* json: update numeric rule to be unambiguous

* Apply suggestions from code review

Co-authored-by: Clint Herron <hanclinto@gmail.com>

* Update examples/server/public/json-schema-to-grammar.mjs

Co-authored-by: Clint Herron <hanclinto@gmail.com>

* json: fix integral-part

* grammar: add repetition tests

---------

Co-authored-by: Clint Herron <hanclinto@gmail.com>
2024-06-06 10:07:06 +01:00
Joan Fontanals f5d7b268ec
llama : add jina v2 base code (#7596)
* feat: add changes to handle jina v2 base code

* fix: do not complicate things

* fix: fix the usage of the code model

* fix: fix comments

* fix: fix linting issues

* fix: remove ollama patches

* style : minor

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-06-06 10:22:41 +03:00
slaren 2d08b7fbb4
docker : build only main and server in their images (#7782)
* add openmp lib to dockerfiles

* build only main and server in their docker images
2024-06-06 08:19:49 +03:00
slaren d67caea0d6
docker : add openmp lib (#7780) 2024-06-06 08:17:21 +03:00
Galunid 7672adeec7
Fix encoding in python scripts (#7733) 2024-06-06 03:07:24 +10:00
Johannes Gäßler 7d1a378b8f
CUDA: refactor mmq, dmmv, mmvq (#7716)
* CUDA: refactor mmq, dmmv, mmvq

* fix out-of-bounds write

* struct for qk, qr, qi

* fix cmake build

* mmq_type_traits
2024-06-05 16:53:00 +02:00
Georgi Gerganov 2b3389677a
ggml : refactor rope norm/neox (#7634)
* ggml : unify rope norm/neox (CPU)

* ggml : fix compile warning

* ggml : remove GLM rope mode

ggml-ci

* metal : better rope implementation

ggml-ci

* cuda : better rope implementation

ggml-ci

* naming : n_orig_ctx -> n_ctx_orig

ggml-ci

* dev : add reminders to update backends

ggml-ci

* vulkan : fix ggml_rope_ext() usage

* cuda : fix array size + indents

ggml-ci
2024-06-05 11:29:20 +03:00
arch-btw 9973e81c5c
readme : remove -ins (#7759)
-ins and --instruct were moved in https://github.com/ggerganov/llama.cpp/pull/7675

I have adjusted the README accordingly.
There was no trace of --chatml in the README.
2024-06-05 09:40:49 +03:00
jaime-m-p c90dbe026b
Fix per token atrributes bits (#7749) 2024-06-05 01:26:14 +02:00
agray3 b90dc566c1
Allow number of nodes in CUDA graph to change (#7738)
Previously the code would have failed to cope in the case that the
number of nodes changes in an existing CUDA graph. This fixes the
issue by removing an unnecessary conditional.
2024-06-04 22:06:49 +02:00
Georgi Gerganov 1442677f92
common : refactor cli arg parsing (#7675)
* common : gpt_params_parse do not print usage

* common : rework usage print (wip)

* common : valign

* common : rework print_usage

* infill : remove cfg support

* common : reorder args

* server : deduplicate parameters

ggml-ci

* common : add missing header

ggml-ci

* common : remote --random-prompt usages

ggml-ci

* examples : migrate to gpt_params

ggml-ci

* batched-bench : migrate to gpt_params

* retrieval : migrate to gpt_params

* common : change defaults for escape and n_ctx

* common : remove chatml and instruct params

ggml-ci

* common : passkey use gpt_params
2024-06-04 21:23:39 +03:00
Georgi Gerganov 554c247caf
ggml : remove OpenCL (#7735)
ggml-ci
2024-06-04 21:23:20 +03:00
Georgi Gerganov 0cd6bd3483
llama : remove beam search (#7736) 2024-06-04 21:23:05 +03:00
Georgi Gerganov 5ca0944a15
readme : remove obsolete Zig instructions (#7471) 2024-06-04 19:43:01 +03:00
slaren adc9ff3841
llama-bench : allow using a different printer for stderr with -oe (#7722)
compare-commits.sh : hide stdout, use -oe to print markdown
2024-06-04 14:32:42 +02:00
Daniele 987d743d6b
Improve hipBLAS support in CMake (#7696)
* Improve hipBLAS support in CMake

This improves the detection of the correct CMAKE_PREFIX_PATH when using different distributions or a self-built ROCm SDK.

* Set ROCM_PATH correctly
2024-06-04 14:09:15 +02:00