Commit Graph

261 Commits

Author SHA1 Message Date
Georgi Gerganov 771d84371c scripts : update sync + fix cmake merge
ggml-ci
2025-03-27 10:09:29 +02:00
Georgi Gerganov df0665a483 sync : ggml
ggml-ci
2025-03-27 09:04:38 +02:00
Georgi Gerganov 102ac1891d sync : ggml
ggml-ci
2025-03-07 14:49:44 +02:00
Olivier Chafik 669912d9a5
`tool-call`: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars (#12034)
* sampler: turn lazy grammar trigger words to regexes

* add scripts/tool_bench.sh & .py

* constrain llama json output regardless of function name if matches at beginning

* update relaxed newline space rule in grammar tests

* support add_generation_prompt query parameter (useful for /apply_template)

* Update src/llama-grammar.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-03-05 13:05:13 +00:00
Daniel Bevenius a057897ad4
llama : add xcframework build script (#11996)
* llama : add xcframework build script

This commit adds a script to build an XCFramework for Apple
ios, macos, visionos, and tvos platforms.

The generated XCFramework can then be added to a project and used in
the same way as a regular framework. The llama.swiftui example project
has been updated to use the XCFramework and can be started using the
following command:
```console
$ open examples/llama.swiftui/llama.swiftui.xcodeproj/
```

Refs: https://github.com/ggml-org/llama.cpp/issues/10747

* examples : remove llama.cpp (source dir ref) from project.pbxproj

This commit removes the reference to llama.cpp from the project.pbxproj
file since Package.swift has been removed.

* ci : updated build.yml to use build-xcframework.sh

* ci : add xcframework build to github releases

This commit adds the ability to create a GitHub release with the
xcframework build artifact.

* scripts : add apple app validation scripts

This commit adds scripts that can validate the iOS, macOS, tvOS, and
VisionOS applications. The scripts create a simple test app project,
copy the llama.xcframework to the test project, build and archive the
app, create an IPA from the archive, and validate the IPA using altool.

The motivation for this is to provide some basic validation and
hopefully avoid having to manually validate apps in Xcode.

* llama : remove Package.swift

This commit removes the Package.swift file, as we are now building an
XCFramework for the project.

* llama : remove Sources and spm-headers directories

* llama : use TargetConditionals.h for visionOS/tvOS
2025-03-05 06:30:31 +01:00
Georgi Gerganov dfd6b2c0be sync : ggml
ggml-ci
2025-03-03 18:18:11 +02:00
Georgi Gerganov 3d1cf3cf33 sync : ggml
ggml-ci
2025-03-03 18:18:11 +02:00
Georgi Gerganov 8371d44595 sync : ggml
ggml-ci
2025-03-03 18:18:11 +02:00
Georgi Gerganov aede2074f6 scripts : sync-ggml-am.sh fix 2025-03-03 18:18:11 +02:00
MoonRide303 5137da7b8c
scripts: corrected encoding when getting chat template (#11866) (#11907)
Signed-off-by: MoonRide303 <moonride303@gmail.com>
2025-02-18 10:30:16 +01:00
Johannes Gäßler 6dde178248
scripts: fix compare-llama-bench commit hash logic (#11891) 2025-02-15 20:23:22 +01:00
Georgi Gerganov 68ff663a04
repo : update links to new url (#11886)
* repo : update links to new url

ggml-ci

* cont : more urls

ggml-ci
2025-02-15 16:40:57 +02:00
Olivier Chafik c7f460ab88
`server`: fix tool-call of DeepSeek R1 Qwen, return reasoning_content (Command 7RB & DeepSeek R1) unless `--reasoning-format none` (#11607)
* extract & return thoughts in reasoning_content field (unless --reasoning-format) for DeepSeek R1 & Command R7B

* tool-calls: add deepseek r1 template (models/templates/llama-cpp-deepseek-r1.jinja) + hackommodate broken official template

* tool-calls: accommodate variety of wrong tool call opening tags both R1 Qwen 32B and 7B distills like to spit out

* server/oai: ensure content is null when there are tool calls, and reasoning_content appears before content for readability

* tool-calls: add DeepSeek R1 Qwen distills to server/README.md & server tests

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-02-13 10:05:16 +00:00
Georgi Gerganov 0fb77f821f
sync : ggml 2025-02-12 21:46:02 +02:00
Georgi Gerganov 8a59053f63
sync : ggml 2025-02-06 21:23:03 +02:00
Georgi Gerganov 7c9e0ca520
sync : ggml 2025-02-04 12:59:21 +02:00
Georgi Gerganov 8ec05832fa
sync : ggml 2025-02-03 14:57:08 +02:00
Olivier Chafik 8b576b6c55
Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars (#9639)
---------

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-01-30 19:13:58 +00:00
Georgi Gerganov 815857791d
sync : ggml 2025-01-29 11:25:29 +02:00
Olivier Chafik 6171c9d258
Add Jinja template support (#11016)
* Copy minja from 58f0ca6dd7

* Add --jinja and --chat-template-file flags

* Add missing <optional> include

* Avoid print in get_hf_chat_template.py

* No designated initializers yet

* Try and work around msvc++ non-macro max resolution quirk

* Update test_chat_completion.py

* Wire LLM_KV_TOKENIZER_CHAT_TEMPLATE_N in llama_model_chat_template

* Refactor test-chat-template

* Test templates w/ minja

* Fix deprecation

* Add --jinja to llama-run

* Update common_chat_format_example to use minja template wrapper

* Test chat_template in e2e test

* Update utils.py

* Update test_chat_completion.py

* Update run.cpp

* Update arg.cpp

* Refactor common_chat_* functions to accept minja template + use_jinja option

* Attempt to fix linkage of LLAMA_CHATML_TEMPLATE

* Revert LLAMA_CHATML_TEMPLATE refactor

* Normalize newlines in test-chat-templates for windows tests

* Forward decl minja::chat_template to avoid eager json dep

* Flush stdout in chat template before potential crash

* Fix copy elision warning

* Rm unused optional include

* Add missing optional include to server.cpp

* Disable jinja test that has a cryptic windows failure

* minja: fix vigogne (https://github.com/google/minja/pull/22)

* Apply suggestions from code review

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Finish suggested renamings

* Move chat_templates inside server_context + remove mutex

* Update --chat-template-file w/ recent change to --chat-template

* Refactor chat template validation

* Guard against missing eos/bos tokens (null token otherwise throws in llama_vocab::impl::token_get_attr)

* Warn against missing eos / bos tokens when jinja template references them

* rename: common_chat_template[s]

* reinstate assert on chat_templates.template_default

* Update minja to b8437df626

* Update minja to https://github.com/google/minja/pull/25

* Update minja from https://github.com/google/minja/pull/27

* rm unused optional header

---------

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-01-21 13:18:51 +00:00
Georgi Gerganov f26c874179
scripts : restore hf.sh (#11288)
ggml-ci
2025-01-18 13:18:32 +02:00
Georgi Gerganov f11cfdfd7f
ci : use -no-cnv in gguf-split tests (#11254)
* ci : use -no-cnv in gguf-split tests

ggml-ci

* ci : use -no-cnv in requantize tests

ggml-ci

* scripts : fix [no ci]
2025-01-15 18:28:35 +02:00
Georgi Gerganov 44d1e796d0
sync : ggml 2025-01-14 10:39:42 +02:00
Georgi Gerganov a4f3f5d8e6
scripts : sync gguf (cont) 2025-01-14 09:40:52 +02:00
Georgi Gerganov 48e1ae0e61
scripts : sync gguf 2025-01-14 09:36:58 +02:00
Georgi Gerganov d00a80e89d
scripts : sync opencl 2025-01-14 09:19:58 +02:00
Georgi Gerganov 99a3755a3c
sync : ggml 2025-01-08 13:40:30 +02:00
Georgi Gerganov 78c6785175 sync : ggml 2025-01-04 16:09:53 +02:00
Djip007 2cd43f4900
ggml : more perfo with llamafile tinyblas on x86_64 (#10714)
* more perfo with llamafile tinyblas on x86_64.

- add bf16 suport
- change dispache strategie (thanks:
https://github.com/ikawrakow/ik_llama.cpp/pull/71 )
- reduce memory bandwidth

simple tinyblas dispache and more cache freindly

* tinyblas dynamic dispaching

* sgemm: add M blocs.

* - git 2.47 use short id of len 9.
- show-progress is not part of GNU Wget2

* remove not stable test
2024-12-24 18:54:49 +01:00
Georgi Gerganov 5437d4aaf5
sync : ggml 2024-12-17 18:36:02 +02:00
Georgi Gerganov 87cf323cef
scripts : change build path to "build-bench" for compare-commits.sh (#10836) 2024-12-15 18:44:47 +02:00
Georgi Gerganov 0cd182ebcc
sync : ggml 2024-12-05 13:27:42 +02:00
Diego Devesa 59f4db1088
ggml : add predefined list of CPU backend variants to build (#10626)
* ggml : add predefined list of CPU backend variants to build

* update CPU dockerfiles
2024-12-04 14:45:40 +01:00
Georgi Gerganov 1cd3df46bd scripts : remove amx sync
ggml-ci
2024-12-03 20:04:49 +02:00
Georgi Gerganov c505471857 sync : ggml 2024-12-03 20:04:49 +02:00
Georgi Gerganov 8648c52101
make : deprecate (#10514)
* make : deprecate

ggml-ci

* ci : disable Makefile builds

ggml-ci

* docs : remove make references [no ci]

* ci : disable swift build

ggml-ci

* docs : remove obsolete make references, scripts, examples

ggml-ci

* basic fix for compare-commits.sh

* update build.md

* more build.md updates

* more build.md updates

* more build.md updates

* Update Makefile

Co-authored-by: Diego Devesa <slarengh@gmail.com>

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-12-02 21:22:53 +02:00
Diego Devesa 3420909dff
ggml : automatic selection of best CPU backend (#10606)
* ggml : automatic selection of best CPU backend

* amx : minor opt

* add GGML_AVX_VNNI to enable avx-vnni, fix checks
2024-12-01 16:12:41 +01:00
Georgi Gerganov fee824a1a1
sync : ggml 2024-11-27 11:10:42 +02:00
Georgi Gerganov 87a533be57
sync : ggml 2024-11-21 09:22:11 +02:00
Georgi Gerganov 9fe0fb0626 sync : ggml 2024-11-19 20:03:21 +02:00
Georgi Gerganov 5c9a8b22b1 scripts : update sync 2024-11-17 08:30:29 +02:00
Johannes Gäßler 4e54be0ec6
llama/ex: remove --logdir argument (#10339) 2024-11-16 23:00:41 +01:00
Georgi Gerganov f245cc28d4
scripts : fix missing key in compare-llama-bench.py (#10332) 2024-11-16 10:32:50 +02:00
Johannes Gäßler 4047be74da
scripts: update compare-llama-bench.py (#10319) 2024-11-15 21:19:03 +01:00
Georgi Gerganov cbf5541a82 sync : ggml 2024-11-15 15:44:06 +02:00
Georgi Gerganov 4802ad350b
scripts : fix regex in sync [no ci] 2024-11-15 08:38:43 +02:00
Georgi Gerganov 5ea926dad7
sync : ggml 2024-11-13 18:11:54 +02:00
Georgi Gerganov eec4d71737
scripts : add amx to sync-ggml.sh [no ci] 2024-11-07 23:11:36 +02:00
Georgi Gerganov 3b08828674
sync : ggml 2024-11-07 23:08:24 +02:00
Georgi Gerganov a2c6fd747c
scripts : sync update 2024-11-07 23:07:55 +02:00
Georgi Gerganov ce027adfb3
sync : ggml 2024-11-04 10:33:37 +02:00
Georgi Gerganov 815fe72adc
sync : ggml 2024-11-01 10:28:24 +02:00
Diego Devesa c5b0f4b5d9
llama : refactor model loader with backend registry (#10026) 2024-10-30 02:01:23 +01:00
Georgi Gerganov 8d8ff71536
llama : remove Tail-Free sampling (#10071)
ggml-ci
2024-10-29 10:42:05 +02:00
Georgi Gerganov cc2983d375
sync : ggml 2024-10-26 10:34:08 +03:00
Georgi Gerganov 9e4a2563ea
scripts : fix amx sync [no ci] 2024-10-26 10:33:31 +03:00
Georgi Gerganov 190a37d797
sync : ggml 2024-10-23 17:23:55 +03:00
Georgi Gerganov 17bb928080
readme : remove --memory-f32 references (#9925) 2024-10-17 23:43:05 +03:00
Georgi Gerganov 0e41b300ed
sync : ggml 2024-10-16 11:28:14 +03:00
standby24x7 fa42aa6d89
scripts : fix spelling typo in messages and comments (#9782)
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
2024-10-08 09:19:53 +03:00
Georgi Gerganov b6d6c5289f
sync : llama.cpp 2024-10-06 12:53:28 +03:00
Georgi Gerganov 58b16695e1
sync : ggml 2024-10-05 15:53:49 +03:00
Georgi Gerganov 17880771ad
sync : ggml 2024-10-04 18:50:25 +03:00
Georgi Gerganov 1bb8a64ebf
sync : ggml 2024-10-03 21:17:49 +03:00
Diego Devesa c83ad6d01e
ggml-backend : add device and backend reg interfaces (#9707)
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2024-10-03 01:49:47 +02:00
Georgi Gerganov f1b8c42711
sync : ggml 2024-10-01 16:09:42 +03:00
Georgi Gerganov d0b1d663e4
sync : ggml 2024-09-29 21:16:07 +03:00
Georgi Gerganov bb5f819975
sync : ggml 2024-09-24 11:01:18 +03:00
Georgi Gerganov 4301535326 sync : ggml
ggml-ci
2024-09-20 21:15:05 +03:00
Georgi Gerganov 0d2f22e45c
scripts : verify py deps at the start of compare (#9520) 2024-09-18 18:34:32 +03:00
Georgi Gerganov 385decbd63 sync : ggml 2024-09-08 11:05:55 +03:00
Georgi Gerganov 60a3107ccd scripts : option to increase git patch context 2024-09-08 11:05:55 +03:00
Georgi Gerganov 231cff5f6f sync : ggml 2024-08-27 22:41:27 +03:00
Georgi Gerganov 4305b57c80
sync : ggml 2024-08-09 10:03:48 +03:00
Georgi Gerganov afd27f01fe
scripts : sync cann files (#0) 2024-08-08 14:56:52 +03:00
Georgi Gerganov 366d486c16
scripts : fix sync filenames (#0) 2024-08-08 14:40:12 +03:00
Georgi Gerganov e44a561ab0
sync : ggml 2024-08-08 13:19:47 +03:00
Georgi Gerganov 5587e57a76 sync : ggml
ggml-ci
2024-08-05 08:50:57 +03:00
Georgi Gerganov 5e2727fe03
scripts : sync vulkan-shaders (#0) 2024-07-27 18:08:47 +03:00
Georgi Gerganov 56f20aa25d
scripts : sync ggml-aarch64 sources 2024-07-27 18:07:33 +03:00
Georgi Gerganov ae7985cd7b sync : ggml
ggml-ci
2024-07-27 17:43:44 +03:00
Georgi Gerganov 3f2d538b81
scripts : fix sync for sycl 2024-07-08 13:51:31 +03:00
Georgi Gerganov 2ee44c9a18 sync : ggml
ggml-ci
2024-07-08 12:23:00 +03:00
compilade 3fd62a6b1c
py : type-check all Python scripts with Pyright (#8341)
* py : type-check all Python scripts with Pyright

* server-tests : use trailing slash in openai base_url

* server-tests : add more type annotations

* server-tests : strip "chat" from base_url in oai_chat_completions

* server-tests : model metadata is a dict

* ci : disable pip cache in type-check workflow

The cache is not shared between branches, and it's 250MB in size,
so it would become quite a big part of the 10GB cache limit of the repo.

* py : fix new type errors from master branch

* tests : fix test-tokenizer-random.py

Apparently, gcc applies optimisations even when pre-processing,
which confuses pycparser.

* ci : only show warnings and errors in python type-check

The "information" level otherwise has entries
from 'examples/pydantic_models_to_grammar.py',
which could be confusing for someone trying to figure out what failed,
considering that these messages can safely be ignored
even though they look like errors.
2024-07-07 15:04:39 -04:00
Georgi Gerganov e235b267a2
py : switch to snake_case (#8305)
* py : switch to snake_case

ggml-ci

* cont

ggml-ci

* cont

ggml-ci

* cont : fix link

* gguf-py : use snake_case in scripts entrypoint export

* py : rename requirements for convert_legacy_llama.py

Needed for scripts/check-requirements.sh

---------

Co-authored-by: Francis Couture-Harpin <git@compilade.net>
2024-07-05 07:53:33 +03:00
ditsuke 821922916f fix: Update script paths in CI scripts 2024-07-04 15:39:13 +00:00
Clint Herron 07a3fc0608
Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. (#8258) 2024-07-02 12:18:10 -04:00
Georgi Gerganov c70d117c37
scripts : fix filename sync 2024-06-26 23:25:22 +03:00
Georgi Gerganov f2d48fffde
sync : ggml 2024-06-26 19:39:19 +03:00
Georgi Gerganov f3f65429c4
llama : reorganize source code + improve CMake (#8006)
* scripts : update sync [no ci]

* files : relocate [no ci]

* ci : disable kompute build [no ci]

* cmake : fixes [no ci]

* server : fix mingw build

ggml-ci

* cmake : minor [no ci]

* cmake : link math library [no ci]

* cmake : build normal ggml library (not object library) [no ci]

* cmake : fix kompute build

ggml-ci

* make,cmake : fix LLAMA_CUDA + replace GGML_CDEF_PRIVATE

ggml-ci

* move public backend headers to the public include directory (#8122)

* move public backend headers to the public include directory

* nix test

* spm : fix metal header

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* scripts : fix sync paths [no ci]

* scripts : sync ggml-blas.h [no ci]

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-06-26 18:33:02 +03:00
jaime-m-p 37bef89433
tokenizer : BPE fixes (#7530)
* Random test: add_bos_token, add_eos_token
* Random test: add BPE models for testing
* Custom regex split fails with codepoint 0
* Fix falcon punctuation regex
* Refactor llm_tokenizer_bpe: move code to constructor
* Move 'add_special_bos/eos' logic to llm_tokenizer_bpe
* Move tokenizer flags to vocab structure.
* Default values for special_add_bos/eos
* Build vocab.special_tokens_cache using vocab token types
* Generalize 'jina-v2' per token attributes
* Fix unicode whitespaces (deepseek-coder, deepseek-llm)
* Skip missing byte tokens (falcon)
* Better unicode data generation
* Replace char32_t with uint32_t
2024-06-18 18:40:52 +02:00
Georgi Gerganov 5326bcceeb
ggml : sync 2024-06-18 09:50:45 +03:00
Olivier Chafik 1c641e6aac
`build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809)
* `main`/`server`: rename to `llama` / `llama-server` for consistency w/ homebrew

* server: update refs -> llama-server

gitignore llama-server

* server: simplify nix package

* main: update refs -> llama

fix examples/main ref

* main/server: fix targets

* update more names

* Update build.yml

* rm accidentally checked in bins

* update straggling refs

* Update .gitignore

* Update server-llm.sh

* main: target name -> llama-cli

* Prefix all example bins w/ llama-

* fix main refs

* rename {main->llama}-cmake-pkg binary

* prefix more cmake targets w/ llama-

* add/fix gbnf-validator subfolder to cmake

* sort cmake example subdirs

* rm bin files

* fix llama-lookup-* Makefile rules

* gitignore /llama-*

* rename Dockerfiles

* rename llama|main -> llama-cli; consistent RPM bin prefixes

* fix some missing -cli suffixes

* rename dockerfile w/ llama-cli

* rename(make): llama-baby-llama

* update dockerfile refs

* more llama-cli(.exe)

* fix test-eval-callback

* rename: llama-cli-cmake-pkg(.exe)

* address gbnf-validator unused fread warning (switched to C++ / ifstream)

* add two missing llama- prefixes

* Updating docs for eval-callback binary to use new `llama-` prefix.

* Updating a few lingering doc references for rename of main to llama-cli

* Updating `run-with-preset.py` to use new binary names.
Updating docs around `perplexity` binary rename.

* Updating documentation references for lookup-merge and export-lora

* Updating two small `main` references missed earlier in the finetune docs.

* Update apps.nix

* update grammar/README.md w/ new llama-* names

* update llama-rpc-server bin name + doc

* Revert "update llama-rpc-server bin name + doc"

This reverts commit e474ef1df4.

* add hot topic notice to README.md

* Update README.md

* Update README.md

* rename gguf-split & quantize bins refs in **/tests.sh

---------

Co-authored-by: HanClinto <hanclinto@gmail.com>
2024-06-13 00:41:52 +01:00
Georgi Gerganov 1442677f92
common : refactor cli arg parsing (#7675)
* common : gpt_params_parse do not print usage

* common : rework usage print (wip)

* common : valign

* common : rework print_usage

* infill : remove cfg support

* common : reorder args

* server : deduplicate parameters

ggml-ci

* common : add missing header

ggml-ci

* common : remote --random-prompt usages

ggml-ci

* examples : migrate to gpt_params

ggml-ci

* batched-bench : migrate to gpt_params

* retrieval : migrate to gpt_params

* common : change defaults for escape and n_ctx

* common : remove chatml and instruct params

ggml-ci

* common : passkey use gpt_params
2024-06-04 21:23:39 +03:00
Georgi Gerganov 554c247caf
ggml : remove OpenCL (#7735)
ggml-ci
2024-06-04 21:23:20 +03:00
slaren adc9ff3841
llama-bench : allow using a different printer for stderr with -oe (#7722)
compare-commits.sh : hide stdout, use -oe to print markdown
2024-06-04 14:32:42 +02:00
Johannes Gäßler c8047d538f
scripts: update compare_llama_bench.py [no ci] (#7673) 2024-05-31 16:26:21 +02:00
Galunid 9c4c9cc83f
Move convert.py to examples/convert-legacy-llama.py (#7430)
* Move convert.py to examples/convert-no-torch.py

* Fix CI, scripts, readme files

* convert-no-torch -> convert-legacy-llama

* Move vocab thing to vocab.py

* Fix convert-no-torch -> convert-legacy-llama

* Fix lost convert.py in ci/run.sh

* Fix imports

* Fix gguf not imported correctly

* Fix flake8 complaints

* Fix check-requirements.sh

* Get rid of ADDED_TOKENS_FILE, FAST_TOKENIZER_FILE

* Review fixes
2024-05-30 21:40:00 +10:00
Georgi Gerganov 00281b7be3
scripts : remove mpi remnants 2024-05-29 14:31:18 +03:00
Georgi Gerganov 2ab977282b
sync : ggml 2024-05-29 14:29:52 +03:00