llama.cpp

Commit Graph

Author	SHA1	Message	Date
Georgi Gerganov	e17991c466	sync : ggml ggml-ci	2025-07-02 20:08:45 +03:00
Georgi Gerganov	f61c05d4b1	sync : ggml ggml-ci	2025-07-01 11:06:39 +03:00
Vedran Miletić	e9b6350e61	scripts : make the shell scripts cross-platform (#14341 )	2025-06-30 10:17:18 +02:00
Georgi Gerganov	06cbedfca1	sync : ggml ggml-ci	2025-06-20 21:02:47 +03:00
Georgi Gerganov	d03172cc79	sync : ggml ggml-ci	2025-06-18 09:59:21 +03:00
Aman Gupta	2e42be42bd	compare-llama-bench: add option to plot (#14169 ) * compare llama-bench: add option to plot * Address review comments: convert case + add type hints * Add matplotlib to requirements * fix tests * Improve comment and fix assert condition for test * Add back default test_name, add --plot_log_scale * use log_scale regardless of x_values	2025-06-14 10:34:20 +02:00
Georgi Gerganov	ae92c1855b	sync : ggml ggml-ci	2025-06-10 18:39:33 +03:00
Georgi Gerganov	b8e2194efc	sync : ggml ggml-ci	2025-06-10 09:21:56 +03:00
Georgi Gerganov	f3a4b1659c	sync : ggml ggml-ci	2025-06-01 13:43:57 +03:00
Georgi Gerganov	53f925074d	sync : vendor (#13901 ) * sync : vendor ggml-ci * cont : fix httplib version ggml-ci * cont : fix lint * cont : fix lint * vendor : move to common folder /vendor ggml-ci * cont : fix lint * cont : move httplib to /vendor + use json_fwd.hpp ggml-ci * cont : fix server build ggml-ci * cont : add missing headers ggml-ci * cont : header clean-up ggml-ci	2025-05-30 16:25:45 +03:00
Georgi Gerganov	1c49c70d07	sync : ggml	2025-05-27 18:05:33 +03:00
Georgi Gerganov	a26c4cc11e	scripts : add option to compare commits in Debug (#13806 ) * scripts : add option to compare commits in Debug * cont : reuse existing CMAKE_OPTS	2025-05-26 22:24:01 +03:00
Olivier Chafik	f5cd27b71d	`server`: streaming of tool calls and thoughts when `--jinja` is on (#12379 ) * add common_json w/ support for truncated json healing * add common_chat_msg_diff * partial common_chat_parse * refactor parser w/ optionals * server: wire chat diffs in stream mode * fix trigger of thinking models (must happen after thoughts are closed) * fix functionary v3.2 raw python! * rename: common_chat_syntax (now contains format) * rm common_regex.at_start * don't return empty <think></think> * accommodate yet another deepseek r1 distill fantasy syntax (`<｜tool▁calls｜>`) * fix QwQ 32B tool call parsing after thoughts (hermes2) * better logs for grammar triggers * consume spaces after parse_json_tool_calls * fix required tool calls w/ thinking models that have pre-opened thinking tags * fix thinking model's initial trigger + test qwq's template * run most test_tool_call tests in stream + non-stream modes * make functionary v3.2 parsing more strict (differentiate first match from others) * send final diff from server, to close off raw python arguments * support partial content streaming in Generic mode * tool-call: allow content prelude before hermes2 tool calls (for Qwen2.5) * Update function-calling.md * Update tool_bench.py * chat-parser: remove input from exception (llm output may contain PII) --------- Co-authored-by: ochafik <ochafik@google.com> Co-authored-by: Olivier Chafik <ochafik@users.noreply.github.com>	2025-05-25 01:48:08 +01:00
Georgi Gerganov	d30cb5a7fa	sync : ggml ggml-ci	2025-05-19 13:29:56 +03:00
Sigbjørn Skjæret	be1d4a13db	scripts : fix compare-llama-bench.py show parameter (#13514 )	2025-05-14 08:41:01 +02:00
Sigbjørn Skjæret	bf79371120	scripts : support arbitrary input file formats in compare-llama-bench.py (#13455 )	2025-05-13 15:31:12 +02:00
Georgi Gerganov	1e2809bc4b	sync : ggml	2025-05-13 14:02:28 +03:00
Sigbjørn Skjæret	09232370fc	scripts : exit compare-llama-bench.py gracefully when there's nothing to compare (#13451 )	2025-05-11 16:20:39 +02:00
Georgi Gerganov	d879433824	sync : ggml ggml-ci	2025-05-07 17:28:36 +03:00
Diego Devesa	1d36b3670b	llama : move end-user examples to tools directory (#13249 ) * llama : move end-user examples to tools directory --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-05-02 20:27:13 +02:00
Georgi Gerganov	b34443923c	sync : ggml (#13268 ) * vulkan : kernels for depthwise 2D convolution (CONV_2D_DW) (ggml/1204) * vulkan : add kernels for depthwise 2d convolution (OP_CONV_2D_DW) * review: remove src_x/y < 0 checks; add performance tests * sync : ggml ggml-ci * vulkan : fix lint (#0) --------- Co-authored-by: Acly <aclysia@gmail.com>	2025-05-02 20:54:30 +03:00
Georgi Gerganov	b1dd4d08e8	sync : ggml ggml-ci	2025-05-01 20:15:34 +03:00
Georgi Gerganov	8d33d740c3	sync : ggml	2025-05-01 10:00:39 +03:00
Johannes Gäßler	19e899ce21	scripts: n_depth for compare-llama-bench [no ci] (#13201 )	2025-04-29 23:32:04 +02:00
Georgi Gerganov	63b4911494	sync : ggml ggml-ci	2025-04-24 17:32:47 +03:00
Georgi Gerganov	526739b879	sync : ggml ggml-ci	2025-04-14 09:26:15 +03:00
Georgi Gerganov	47ba87d0a4	sync : ggml	2025-04-11 00:17:47 +03:00
Georgi Gerganov	eb420e1148	sync : ggml ggml-ci	2025-04-11 00:17:47 +03:00
Georgi Gerganov	e4bf72d631	scripts : fix sync-ggml-am.sh	2025-04-11 00:17:47 +03:00
Georgi Gerganov	a4e46e28f9	sync : ggml ggml-ci	2025-04-07 18:44:17 +03:00
Georgi Gerganov	0114a32da0	sync : ggml ggml-ci	2025-03-31 15:07:32 +03:00
Georgi Gerganov	d3f1f0acfb	sync : ggml ggml-ci	2025-03-30 08:33:31 +03:00
Georgi Gerganov	029c693fdc	sync : ggml ggml-ci	2025-03-27 10:09:29 +02:00
Georgi Gerganov	771d84371c	scripts : update sync + fix cmake merge ggml-ci	2025-03-27 10:09:29 +02:00
Georgi Gerganov	df0665a483	sync : ggml ggml-ci	2025-03-27 09:04:38 +02:00
Georgi Gerganov	102ac1891d	sync : ggml ggml-ci	2025-03-07 14:49:44 +02:00
Olivier Chafik	669912d9a5	`tool-call`: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars (#12034 ) * sampler: turn lazy grammar trigger words to regexes * add scripts/tool_bench.sh & .py * constrain llama json output regardless of function name if matches at beginning * update relaxed newline space rule in grammar tests * support add_generation_prompt query parameter (useful for /apply_template) * Update src/llama-grammar.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-03-05 13:05:13 +00:00
Daniel Bevenius	a057897ad4	llama : add xcframework build script (#11996 ) * llama : add xcframework build script This commit adds a script to build an XCFramework for Apple ios, macos, visionos, and tvos platforms. The generated XCFramework can then be added to a project and used in the same way as a regular framework. The llama.swiftui example project has been updated to use the XCFramework and can be started using the following command: ```console $ open examples/llama.swiftui/llama.swiftui.xcodeproj/ ``` Refs: https://github.com/ggml-org/llama.cpp/issues/10747 * examples : remove llama.cpp (source dir ref) from project.pbxproj This commit removes the reference to llama.cpp from the project.pbxproj file since Package.swift has been removed. * ci : updated build.yml to use build-xcframework.sh * ci : add xcframework build to github releases This commit adds the ability to create a GitHub release with the xcframework build artifact. * scripts : add apple app validation scripts This commit adds scripts that can validate the iOS, macOS, tvOS, and VisionOS applications. The scripts create a simple test app project, copy the llama.xcframework to the test project, build and archive the app, create an IPA from the archive, and validate the IPA using altool. The motivation for this is to provide some basic validation and hopefully avoid having to manually validate apps in Xcode. * llama : remove Package.swift This commit removes the Package.swift file, as we are now building an XCFramework for the project. * llama : remove Sources and spm-headers directories * llama : use TargetConditionals.h for visionOS/tvOS	2025-03-05 06:30:31 +01:00
Georgi Gerganov	dfd6b2c0be	sync : ggml ggml-ci	2025-03-03 18:18:11 +02:00
Georgi Gerganov	3d1cf3cf33	sync : ggml ggml-ci	2025-03-03 18:18:11 +02:00
Georgi Gerganov	8371d44595	sync : ggml ggml-ci	2025-03-03 18:18:11 +02:00
Georgi Gerganov	aede2074f6	scripts : sync-ggml-am.sh fix	2025-03-03 18:18:11 +02:00
MoonRide303	5137da7b8c	scripts: corrected encoding when getting chat template (#11866 ) (#11907 ) Signed-off-by: MoonRide303 <moonride303@gmail.com>	2025-02-18 10:30:16 +01:00
Johannes Gäßler	6dde178248	scripts: fix compare-llama-bench commit hash logic (#11891 )	2025-02-15 20:23:22 +01:00
Georgi Gerganov	68ff663a04	repo : update links to new url (#11886 ) * repo : update links to new url ggml-ci * cont : more urls ggml-ci	2025-02-15 16:40:57 +02:00
Olivier Chafik	c7f460ab88	`server`: fix tool-call of DeepSeek R1 Qwen, return reasoning_content (Command 7RB & DeepSeek R1) unless `--reasoning-format none` (#11607 ) * extract & return thoughts in reasoning_content field (unless --reasoning-format) for DeepSeek R1 & Command R7B * tool-calls: add deepseek r1 template (models/templates/llama-cpp-deepseek-r1.jinja) + hackommodate broken official template * tool-calls: accommodate variety of wrong tool call opening tags both R1 Qwen 32B and 7B distills like to spit out * server/oai: ensure content is null when there are tool calls, and reasoning_content appears before content for readability * tool-calls: add DeepSeek R1 Qwen distills to server/README.md & server tests Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-02-13 10:05:16 +00:00
Georgi Gerganov	0fb77f821f	sync : ggml	2025-02-12 21:46:02 +02:00
Georgi Gerganov	8a59053f63	sync : ggml	2025-02-06 21:23:03 +02:00
Georgi Gerganov	7c9e0ca520	sync : ggml	2025-02-04 12:59:21 +02:00
Georgi Gerganov	8ec05832fa	sync : ggml	2025-02-03 14:57:08 +02:00
Olivier Chafik	8b576b6c55	Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars (#9639 ) --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-01-30 19:13:58 +00:00
Georgi Gerganov	815857791d	sync : ggml	2025-01-29 11:25:29 +02:00
Olivier Chafik	6171c9d258	Add Jinja template support (#11016 ) * Copy minja from `58f0ca6dd7` * Add --jinja and --chat-template-file flags * Add missing <optional> include * Avoid print in get_hf_chat_template.py * No designated initializers yet * Try and work around msvc++ non-macro max resolution quirk * Update test_chat_completion.py * Wire LLM_KV_TOKENIZER_CHAT_TEMPLATE_N in llama_model_chat_template * Refactor test-chat-template * Test templates w/ minja * Fix deprecation * Add --jinja to llama-run * Update common_chat_format_example to use minja template wrapper * Test chat_template in e2e test * Update utils.py * Update test_chat_completion.py * Update run.cpp * Update arg.cpp * Refactor common_chat_* functions to accept minja template + use_jinja option * Attempt to fix linkage of LLAMA_CHATML_TEMPLATE * Revert LLAMA_CHATML_TEMPLATE refactor * Normalize newlines in test-chat-templates for windows tests * Forward decl minja::chat_template to avoid eager json dep * Flush stdout in chat template before potential crash * Fix copy elision warning * Rm unused optional include * Add missing optional include to server.cpp * Disable jinja test that has a cryptic windows failure * minja: fix vigogne (https://github.com/google/minja/pull/22) * Apply suggestions from code review Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Finish suggested renamings * Move chat_templates inside server_context + remove mutex * Update --chat-template-file w/ recent change to --chat-template * Refactor chat template validation * Guard against missing eos/bos tokens (null token otherwise throws in llama_vocab::impl::token_get_attr) * Warn against missing eos / bos tokens when jinja template references them * rename: common_chat_template[s] * reinstate assert on chat_templates.template_default * Update minja to `b8437df626` * Update minja to https://github.com/google/minja/pull/25 * Update minja from https://github.com/google/minja/pull/27 * rm unused optional header --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-01-21 13:18:51 +00:00
Georgi Gerganov	f26c874179	scripts : restore hf.sh (#11288 ) ggml-ci	2025-01-18 13:18:32 +02:00
Georgi Gerganov	f11cfdfd7f	ci : use -no-cnv in gguf-split tests (#11254 ) * ci : use -no-cnv in gguf-split tests ggml-ci * ci : use -no-cnv in requantize tests ggml-ci * scripts : fix [no ci]	2025-01-15 18:28:35 +02:00
Georgi Gerganov	44d1e796d0	sync : ggml	2025-01-14 10:39:42 +02:00
Georgi Gerganov	a4f3f5d8e6	scripts : sync gguf (cont)	2025-01-14 09:40:52 +02:00
Georgi Gerganov	48e1ae0e61	scripts : sync gguf	2025-01-14 09:36:58 +02:00
Georgi Gerganov	d00a80e89d	scripts : sync opencl	2025-01-14 09:19:58 +02:00
Georgi Gerganov	99a3755a3c	sync : ggml	2025-01-08 13:40:30 +02:00
Georgi Gerganov	78c6785175	sync : ggml	2025-01-04 16:09:53 +02:00
Djip007	2cd43f4900	ggml : more perfo with llamafile tinyblas on x86_64 (#10714 ) * more perfo with llamafile tinyblas on x86_64. - add bf16 suport - change dispache strategie (thanks: https://github.com/ikawrakow/ik_llama.cpp/pull/71 ) - reduce memory bandwidth simple tinyblas dispache and more cache freindly * tinyblas dynamic dispaching * sgemm: add M blocs. * - git 2.47 use short id of len 9. - show-progress is not part of GNU Wget2 * remove not stable test	2024-12-24 18:54:49 +01:00
Georgi Gerganov	5437d4aaf5	sync : ggml	2024-12-17 18:36:02 +02:00
Georgi Gerganov	87cf323cef	scripts : change build path to "build-bench" for compare-commits.sh (#10836 )	2024-12-15 18:44:47 +02:00
Georgi Gerganov	0cd182ebcc	sync : ggml	2024-12-05 13:27:42 +02:00
Diego Devesa	59f4db1088	ggml : add predefined list of CPU backend variants to build (#10626 ) * ggml : add predefined list of CPU backend variants to build * update CPU dockerfiles	2024-12-04 14:45:40 +01:00
Georgi Gerganov	1cd3df46bd	scripts : remove amx sync ggml-ci	2024-12-03 20:04:49 +02:00
Georgi Gerganov	c505471857	sync : ggml	2024-12-03 20:04:49 +02:00
Georgi Gerganov	8648c52101	make : deprecate (#10514 ) * make : deprecate ggml-ci * ci : disable Makefile builds ggml-ci * docs : remove make references [no ci] * ci : disable swift build ggml-ci * docs : remove obsolete make references, scripts, examples ggml-ci * basic fix for compare-commits.sh * update build.md * more build.md updates * more build.md updates * more build.md updates * Update Makefile Co-authored-by: Diego Devesa <slarengh@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-12-02 21:22:53 +02:00
Diego Devesa	3420909dff	ggml : automatic selection of best CPU backend (#10606 ) * ggml : automatic selection of best CPU backend * amx : minor opt * add GGML_AVX_VNNI to enable avx-vnni, fix checks	2024-12-01 16:12:41 +01:00
Georgi Gerganov	fee824a1a1	sync : ggml	2024-11-27 11:10:42 +02:00
Georgi Gerganov	87a533be57	sync : ggml	2024-11-21 09:22:11 +02:00
Georgi Gerganov	9fe0fb0626	sync : ggml	2024-11-19 20:03:21 +02:00
Georgi Gerganov	5c9a8b22b1	scripts : update sync	2024-11-17 08:30:29 +02:00
Johannes Gäßler	4e54be0ec6	llama/ex: remove --logdir argument (#10339 )	2024-11-16 23:00:41 +01:00
Georgi Gerganov	f245cc28d4	scripts : fix missing key in compare-llama-bench.py (#10332 )	2024-11-16 10:32:50 +02:00
Johannes Gäßler	4047be74da	scripts: update compare-llama-bench.py (#10319 )	2024-11-15 21:19:03 +01:00
Georgi Gerganov	cbf5541a82	sync : ggml	2024-11-15 15:44:06 +02:00
Georgi Gerganov	4802ad350b	scripts : fix regex in sync [no ci]	2024-11-15 08:38:43 +02:00
Georgi Gerganov	5ea926dad7	sync : ggml	2024-11-13 18:11:54 +02:00
Georgi Gerganov	eec4d71737	scripts : add amx to sync-ggml.sh [no ci]	2024-11-07 23:11:36 +02:00
Georgi Gerganov	3b08828674	sync : ggml	2024-11-07 23:08:24 +02:00
Georgi Gerganov	a2c6fd747c	scripts : sync update	2024-11-07 23:07:55 +02:00
Georgi Gerganov	ce027adfb3	sync : ggml	2024-11-04 10:33:37 +02:00
Georgi Gerganov	815fe72adc	sync : ggml	2024-11-01 10:28:24 +02:00
Diego Devesa	c5b0f4b5d9	llama : refactor model loader with backend registry (#10026 )	2024-10-30 02:01:23 +01:00
Georgi Gerganov	8d8ff71536	llama : remove Tail-Free sampling (#10071 ) ggml-ci	2024-10-29 10:42:05 +02:00
Georgi Gerganov	cc2983d375	sync : ggml	2024-10-26 10:34:08 +03:00
Georgi Gerganov	9e4a2563ea	scripts : fix amx sync [no ci]	2024-10-26 10:33:31 +03:00
Georgi Gerganov	190a37d797	sync : ggml	2024-10-23 17:23:55 +03:00
Georgi Gerganov	17bb928080	readme : remove --memory-f32 references (#9925 )	2024-10-17 23:43:05 +03:00
Georgi Gerganov	0e41b300ed	sync : ggml	2024-10-16 11:28:14 +03:00
standby24x7	fa42aa6d89	scripts : fix spelling typo in messages and comments (#9782 ) Signed-off-by: Masanari Iida <standby24x7@gmail.com>	2024-10-08 09:19:53 +03:00
Georgi Gerganov	b6d6c5289f	sync : llama.cpp	2024-10-06 12:53:28 +03:00
Georgi Gerganov	58b16695e1	sync : ggml	2024-10-05 15:53:49 +03:00
Georgi Gerganov	17880771ad	sync : ggml	2024-10-04 18:50:25 +03:00
Georgi Gerganov	1bb8a64ebf	sync : ggml	2024-10-03 21:17:49 +03:00
Diego Devesa	c83ad6d01e	ggml-backend : add device and backend reg interfaces (#9707 ) Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2024-10-03 01:49:47 +02:00
Georgi Gerganov	f1b8c42711	sync : ggml	2024-10-01 16:09:42 +03:00
Georgi Gerganov	d0b1d663e4	sync : ggml	2024-09-29 21:16:07 +03:00
Georgi Gerganov	bb5f819975	sync : ggml	2024-09-24 11:01:18 +03:00
Georgi Gerganov	4301535326	sync : ggml ggml-ci	2024-09-20 21:15:05 +03:00
Georgi Gerganov	0d2f22e45c	scripts : verify py deps at the start of compare (#9520 )	2024-09-18 18:34:32 +03:00
Georgi Gerganov	385decbd63	sync : ggml	2024-09-08 11:05:55 +03:00
Georgi Gerganov	60a3107ccd	scripts : option to increase git patch context	2024-09-08 11:05:55 +03:00
Georgi Gerganov	231cff5f6f	sync : ggml	2024-08-27 22:41:27 +03:00
Georgi Gerganov	4305b57c80	sync : ggml	2024-08-09 10:03:48 +03:00
Georgi Gerganov	afd27f01fe	scripts : sync cann files (#0 )	2024-08-08 14:56:52 +03:00
Georgi Gerganov	366d486c16	scripts : fix sync filenames (#0 )	2024-08-08 14:40:12 +03:00
Georgi Gerganov	e44a561ab0	sync : ggml	2024-08-08 13:19:47 +03:00
Georgi Gerganov	5587e57a76	sync : ggml ggml-ci	2024-08-05 08:50:57 +03:00
Georgi Gerganov	5e2727fe03	scripts : sync vulkan-shaders (#0 )	2024-07-27 18:08:47 +03:00
Georgi Gerganov	56f20aa25d	scripts : sync ggml-aarch64 sources	2024-07-27 18:07:33 +03:00
Georgi Gerganov	ae7985cd7b	sync : ggml ggml-ci	2024-07-27 17:43:44 +03:00
Georgi Gerganov	3f2d538b81	scripts : fix sync for sycl	2024-07-08 13:51:31 +03:00
Georgi Gerganov	2ee44c9a18	sync : ggml ggml-ci	2024-07-08 12:23:00 +03:00
compilade	3fd62a6b1c	py : type-check all Python scripts with Pyright (#8341 ) * py : type-check all Python scripts with Pyright * server-tests : use trailing slash in openai base_url * server-tests : add more type annotations * server-tests : strip "chat" from base_url in oai_chat_completions * server-tests : model metadata is a dict * ci : disable pip cache in type-check workflow The cache is not shared between branches, and it's 250MB in size, so it would become quite a big part of the 10GB cache limit of the repo. * py : fix new type errors from master branch * tests : fix test-tokenizer-random.py Apparently, gcc applies optimisations even when pre-processing, which confuses pycparser. * ci : only show warnings and errors in python type-check The "information" level otherwise has entries from 'examples/pydantic_models_to_grammar.py', which could be confusing for someone trying to figure out what failed, considering that these messages can safely be ignored even though they look like errors.	2024-07-07 15:04:39 -04:00
Georgi Gerganov	e235b267a2	py : switch to snake_case (#8305 ) * py : switch to snake_case ggml-ci * cont ggml-ci * cont ggml-ci * cont : fix link * gguf-py : use snake_case in scripts entrypoint export * py : rename requirements for convert_legacy_llama.py Needed for scripts/check-requirements.sh --------- Co-authored-by: Francis Couture-Harpin <git@compilade.net>	2024-07-05 07:53:33 +03:00
ditsuke	821922916f	fix: Update script paths in CI scripts	2024-07-04 15:39:13 +00:00
Clint Herron	07a3fc0608	Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. (#8258 )	2024-07-02 12:18:10 -04:00
Georgi Gerganov	c70d117c37	scripts : fix filename sync	2024-06-26 23:25:22 +03:00
Georgi Gerganov	f2d48fffde	sync : ggml	2024-06-26 19:39:19 +03:00
Georgi Gerganov	f3f65429c4	llama : reorganize source code + improve CMake (#8006 ) * scripts : update sync [no ci] * files : relocate [no ci] * ci : disable kompute build [no ci] * cmake : fixes [no ci] * server : fix mingw build ggml-ci * cmake : minor [no ci] * cmake : link math library [no ci] * cmake : build normal ggml library (not object library) [no ci] * cmake : fix kompute build ggml-ci * make,cmake : fix LLAMA_CUDA + replace GGML_CDEF_PRIVATE ggml-ci * move public backend headers to the public include directory (#8122) * move public backend headers to the public include directory * nix test * spm : fix metal header --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * scripts : fix sync paths [no ci] * scripts : sync ggml-blas.h [no ci] --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-06-26 18:33:02 +03:00
jaime-m-p	37bef89433	tokenizer : BPE fixes (#7530 ) * Random test: add_bos_token, add_eos_token * Random test: add BPE models for testing * Custom regex split fails with codepoint 0 * Fix falcon punctuation regex * Refactor llm_tokenizer_bpe: move code to constructor * Move 'add_special_bos/eos' logic to llm_tokenizer_bpe * Move tokenizer flags to vocab structure. * Default values for special_add_bos/eos * Build vocab.special_tokens_cache using vocab token types * Generalize 'jina-v2' per token attributes * Fix unicode whitespaces (deepseek-coder, deepseek-llm) * Skip missing byte tokens (falcon) * Better unicode data generation * Replace char32_t with uint32_t	2024-06-18 18:40:52 +02:00
Georgi Gerganov	5326bcceeb	ggml : sync	2024-06-18 09:50:45 +03:00
Olivier Chafik	1c641e6aac	`build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 ) * `main`/`server`: rename to `llama` / `llama-server` for consistency w/ homebrew * server: update refs -> llama-server gitignore llama-server * server: simplify nix package * main: update refs -> llama fix examples/main ref * main/server: fix targets * update more names * Update build.yml * rm accidentally checked in bins * update straggling refs * Update .gitignore * Update server-llm.sh * main: target name -> llama-cli * Prefix all example bins w/ llama- * fix main refs * rename {main->llama}-cmake-pkg binary * prefix more cmake targets w/ llama- * add/fix gbnf-validator subfolder to cmake * sort cmake example subdirs * rm bin files * fix llama-lookup-* Makefile rules * gitignore /llama-* * rename Dockerfiles * rename llama\|main -> llama-cli; consistent RPM bin prefixes * fix some missing -cli suffixes * rename dockerfile w/ llama-cli * rename(make): llama-baby-llama * update dockerfile refs * more llama-cli(.exe) * fix test-eval-callback * rename: llama-cli-cmake-pkg(.exe) * address gbnf-validator unused fread warning (switched to C++ / ifstream) * add two missing llama- prefixes * Updating docs for eval-callback binary to use new `llama-` prefix. * Updating a few lingering doc references for rename of main to llama-cli * Updating `run-with-preset.py` to use new binary names. Updating docs around `perplexity` binary rename. * Updating documentation references for lookup-merge and export-lora * Updating two small `main` references missed earlier in the finetune docs. * Update apps.nix * update grammar/README.md w/ new llama-* names * update llama-rpc-server bin name + doc * Revert "update llama-rpc-server bin name + doc" This reverts commit `e474ef1df4`. * add hot topic notice to README.md * Update README.md * Update README.md * rename gguf-split & quantize bins refs in **/tests.sh --------- Co-authored-by: HanClinto <hanclinto@gmail.com>	2024-06-13 00:41:52 +01:00
Georgi Gerganov	1442677f92	common : refactor cli arg parsing (#7675 ) * common : gpt_params_parse do not print usage * common : rework usage print (wip) * common : valign * common : rework print_usage * infill : remove cfg support * common : reorder args * server : deduplicate parameters ggml-ci * common : add missing header ggml-ci * common : remote --random-prompt usages ggml-ci * examples : migrate to gpt_params ggml-ci * batched-bench : migrate to gpt_params * retrieval : migrate to gpt_params * common : change defaults for escape and n_ctx * common : remove chatml and instruct params ggml-ci * common : passkey use gpt_params	2024-06-04 21:23:39 +03:00
Georgi Gerganov	554c247caf	ggml : remove OpenCL (#7735 ) ggml-ci	2024-06-04 21:23:20 +03:00
slaren	adc9ff3841	llama-bench : allow using a different printer for stderr with -oe (#7722 ) compare-commits.sh : hide stdout, use -oe to print markdown	2024-06-04 14:32:42 +02:00
Johannes Gäßler	c8047d538f	scripts: update compare_llama_bench.py [no ci] (#7673 )	2024-05-31 16:26:21 +02:00
Galunid	9c4c9cc83f	Move convert.py to examples/convert-legacy-llama.py (#7430 ) * Move convert.py to examples/convert-no-torch.py * Fix CI, scripts, readme files * convert-no-torch -> convert-legacy-llama * Move vocab thing to vocab.py * Fix convert-no-torch -> convert-legacy-llama * Fix lost convert.py in ci/run.sh * Fix imports * Fix gguf not imported correctly * Fix flake8 complaints * Fix check-requirements.sh * Get rid of ADDED_TOKENS_FILE, FAST_TOKENIZER_FILE * Review fixes	2024-05-30 21:40:00 +10:00
Georgi Gerganov	00281b7be3	scripts : remove mpi remnants	2024-05-29 14:31:18 +03:00
Georgi Gerganov	2ab977282b	sync : ggml	2024-05-29 14:29:52 +03:00
slaren	d359f30921	llama : remove MPI backend (#7395 )	2024-05-20 01:17:03 +02:00
jaime-m-p	b43272afa2	Unicode codepoint flags for custom regexs (#7245 ) * Replace CODEPOINT_TYPE_* with codepoint_flags * Update and bugfix brute force random test * Deterministic brute force random test * Unicode normalization NFD * Get rid of BOM	2024-05-18 01:09:13 +02:00
Brian	51e9d02599	Added a single test function script and fix debug-test.sh to be more robust (#7279 ) * run-single-test.sh: added a single test function script and fix debug-test.sh to be more robust * debug-test.sh: combined execute and gdb test mode via -g flag * debug-test.sh: refactor * debug-test: refactor for clarity * debug-test.sh: comment style changes * debug-test.sh: fix gdb	2024-05-17 22:40:14 +10:00
Georgi Gerganov	29499bb593	sync : ggml	2024-05-15 13:23:41 +03:00
Georgi Gerganov	9f773486ab	script : sync ggml-rpc	2024-05-14 19:14:38 +03:00
Georgi Gerganov	a5e3fde857	sync : ggml ggml-ci	2024-05-14 19:08:09 +03:00
Georgi Gerganov	7bd4ffb780	metal : fix warnings (skipme) (#0 )	2024-05-11 21:38:13 +03:00
Georgi Gerganov	1622ac023f	sync : ggml	2024-05-11 21:35:05 +03:00
Josh Ramer	fed0108491	Scripting & documenting debugging one test without anything else in the loop. (#7096 ) * A little documentation that shares my quick tips for working in the repository. * Update startup-testing-debugging.md * script that shows a menu of tests to pick from & run the debugger on * debug-test.sh: Refactor CLI help message * debug-test.sh: documentation update * debug-test.sh: CLI Help output corrections * debug-test.sh: minor doc fix --------- authored-by: Josh Ramer <ubuntu@ip-172-31-32-53.ec2.internal> Assisted-by: brian khuu <mofosyne@gmail.com>	2024-05-12 03:26:35 +10:00
Georgi Gerganov	fae9d234b6	sync : ggml ggml-ci	2024-05-11 15:38:34 +03:00
slaren	e849648888	llama-bench : add pp+tg test type (#7199 )	2024-05-10 18:03:54 +02:00
jaime-m-p	43248e5594	llama3 custom regex split (#6965 ) * merged the changes from deepseeker models to main branch * Moved regex patterns to unicode.cpp and updated unicode.h * Moved header files * Resolved issues * added and refactored unicode_regex_split and related functions * Updated/merged the deepseek coder pr * Refactored code * Adding unicode regex mappings * Adding unicode regex function * Added needed functionality, testing remains * Fixed issues * Fixed issue with gpt2 regex custom preprocessor * unicode : fix? unicode_wstring_to_utf8 * lint : fix whitespaces * tests : add tokenizer tests for numbers * unicode : remove redundant headers * tests : remove and rename tokenizer test scripts * tests : add sample usage * gguf-py : reader prints warnings on duplicate keys * llama : towards llama3 tokenization support (wip) * unicode : shot in the dark to fix tests on Windows * unicode : first try custom implementations * convert : add "tokenizer.ggml.pre" GGUF KV (wip) * llama : use new pre-tokenizer type * convert : fix pre-tokenizer type writing * lint : fix * make : add test-tokenizer-0-llama-v3 * wip * models : add llama v3 vocab file * llama : adapt punctuation regex + add llama 3 regex * minor * unicode : set bomb * unicode : set bomb * unicode : always use std::wregex * unicode : support \p{N}, \p{L} and \p{P} natively * unicode : try fix windows * unicode : category support via std::regex * unicode : clean-up * unicode : simplify * llama3 custom regex split * convert : add convert-hf-to-gguf-update.py ggml-ci * lint : update * convert : add falcon ggml-ci * unicode : normalize signatures * lint : fix * lint : fix * convert : remove unused functions * convert : add comments * convert : exercise contractions ggml-ci * Using char32_t for codepoints * lint : fix * already exists unicode_tolower() * Typing * Restore BOM * cmake : refactor test targets * tests : refactor vocab tests ggml-ci * tests : add more vocabs and tests ggml-ci * unicode : cleanup * scripts : ignore new update script in check-requirements.sh * Fix merge * models : add phi-3, mpt, gpt-2, starcoder * tests : disable obsolete ggml-ci * tests : use faster bpe test ggml-ci * llama : more prominent warning for old BPE models * tests : disable test-tokenizer-1-bpe due to slowness ggml-ci * Move unused variable value * GPT2 custom regex split * Add alternative regex for custom aplit llama3 Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Style * Add bruteforce random tests for token encoding * wip: fixing unicode codepoint ranges * Fix merge * Unicode tables: separator, lowercase, uppercase and whitespace * llama3 custom regex split: fix \s * Restore BOM * Style * wip: generate NDF table * Ignore special tokens for testing * Clean gen-unicode-data.py * Refactor random tokenizer test * lint : fix * tests : add fail test for llama-bpe --------- Co-authored-by: Jaggzh <jaggz.h@gmail.com> Co-authored-by: Kazim Abrar Mahi <kazimabrarmahi135@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: jaime-m-p <>	2024-05-09 23:30:44 +10:00
Brian	acdce3cdef	compare-llama-bench.py: add missing basicConfig (#7138 ) * compare-llama-bench.py: add missing basicConfig * compare-llama-bench.py: Add line break between error message and print_help() * Add regular print() markdown table	2024-05-08 10:54:39 +02:00
Brian	6fbd432211	py : logging and flake8 suppression refactoring (#7081 ) Set one as executable and add basicConfig() to another. Also added noqa tag to test scripts.	2024-05-05 08:07:48 +03:00
Georgi Gerganov	92139b90af	tests : add test-tokenizer-0.sh + fix some tokenizers (#7036 ) * tests : add test-tokenizer-0.sh * unicode : add all unicode number ranges * starcoder : fix pre-tokenizer * tests : add test that fails with DeepSeek tokenizers * falcon : fix regex * unicode : regenerate unicode tables * refact : add tokenizer model * lint : fix * tests : disable failing tests ggml-ci * refact : add tests files ggml-ci * convert : print -> logging ggml-ci * lint : fix * unicode : digit -> number * phi-3 : update	2024-05-04 08:32:32 +03:00
Brian	a2ac89d6ef	convert.py : add python logging instead of print() (#6511 ) * convert.py: add python logging instead of print() * convert.py: verbose flag takes priority over dump flag log suppression * convert.py: named instance logging * convert.py: use explicit logger id string * convert.py: convert extra print() to named logger * convert.py: sys.stderr.write --> logger.error * .py: Convert all python scripts to use logging module requirements.txt: remove extra line * flake8: update flake8 ignore and exclude to match ci settings * gh-actions: add flake8-no-print to flake8 lint step * pre-commit: add flake8-no-print to flake8 and also update pre-commit version * convert-hf-to-gguf.py: print() to logger conversion * .py: logging basiconfig refactor to use conditional expression .py: removed commented out logging fixup! .py: logging basiconfig refactor to use conditional expression constant.py: logger.error then exit should be a raise exception instead * .py: Convert logger error and sys.exit() into a raise exception (for atypical error) gguf-convert-endian.py: refactor convert_byteorder() to use tqdm progressbar * verify-checksum-model.py: This is the result of the program, it should be printed to stdout. * compare-llama-bench.py: add blank line for readability during missing repo response * reader.py: read_gguf_file() use print() over logging * convert.py: warning goes to stderr and won't hurt the dump output * gguf-dump.py: dump_metadata() should print to stdout * convert-hf-to-gguf.py: print --> logger.debug or ValueError() * verify-checksum-models.py: use print() for printing table * .py: refactor logging.basicConfig() gguf-py/gguf/.py: use __name__ as logger name Since they will be imported and not run directly. python-lint.yml: use .flake8 file instead * constants.py: logger no longer required * convert-hf-to-gguf.py: add additional logging * convert-hf-to-gguf.py: print() --> logger * .py: fix flake8 warnings revert changes to convert-hf-to-gguf.py for get_name() * convert-hf-to-gguf-update.py: use triple quoted f-string instead * .py: accidentally corrected the wrong line *.py: add compilade warning suggestions and style fixes	2024-05-03 22:36:41 +03:00
Georgi Gerganov	f4ab2a4147	llama : fix BPE pre-tokenization (#6920 ) * merged the changes from deepseeker models to main branch * Moved regex patterns to unicode.cpp and updated unicode.h * Moved header files * Resolved issues * added and refactored unicode_regex_split and related functions * Updated/merged the deepseek coder pr * Refactored code * Adding unicode regex mappings * Adding unicode regex function * Added needed functionality, testing remains * Fixed issues * Fixed issue with gpt2 regex custom preprocessor * unicode : fix? unicode_wstring_to_utf8 * lint : fix whitespaces * tests : add tokenizer tests for numbers * unicode : remove redundant headers * tests : remove and rename tokenizer test scripts * tests : add sample usage * gguf-py : reader prints warnings on duplicate keys * llama : towards llama3 tokenization support (wip) * unicode : shot in the dark to fix tests on Windows * unicode : first try custom implementations * convert : add "tokenizer.ggml.pre" GGUF KV (wip) * llama : use new pre-tokenizer type * convert : fix pre-tokenizer type writing * lint : fix * make : add test-tokenizer-0-llama-v3 * wip * models : add llama v3 vocab file * llama : adapt punctuation regex + add llama 3 regex * minor * unicode : set bomb * unicode : set bomb * unicode : always use std::wregex * unicode : support \p{N}, \p{L} and \p{P} natively * unicode : try fix windows * unicode : category support via std::regex * unicode : clean-up * unicode : simplify * convert : add convert-hf-to-gguf-update.py ggml-ci * lint : update * convert : add falcon ggml-ci * unicode : normalize signatures * lint : fix * lint : fix * convert : remove unused functions * convert : add comments * convert : exercise contractions ggml-ci * lint : fix * cmake : refactor test targets * tests : refactor vocab tests ggml-ci * tests : add more vocabs and tests ggml-ci * unicode : cleanup * scripts : ignore new update script in check-requirements.sh * models : add phi-3, mpt, gpt-2, starcoder * tests : disable obsolete ggml-ci * tests : use faster bpe test ggml-ci * llama : more prominent warning for old BPE models * tests : disable test-tokenizer-1-bpe due to slowness ggml-ci --------- Co-authored-by: Jaggzh <jaggz.h@gmail.com> Co-authored-by: Kazim Abrar Mahi <kazimabrarmahi135@gmail.com>	2024-04-29 16:58:41 +03:00

1 2 3 4 5 ...

344 Commits