llama.cpp

Commit Graph

Author	SHA1	Message	Date
Daniel Han	dd0f321941	readme : add Unsloth exporting to GGUF in tools (#17411 )	2025-11-20 20:07:36 +01:00
ixgbe	307772fcda	readme : add RVV,ZVFH,ZFH,ZICBOP support for RISC-V (#17259 ) Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>	2025-11-14 09:12:56 +02:00
Georgi Gerganov	afd353246d	readme : update hot topics (#17002 )	2025-11-04 17:21:31 +02:00
amirai21	8d8862829c	docs : add Jamba to Text-only models list (#16778 )	2025-10-26 13:01:20 +01:00
Max Krasnyansky	63d2fc46e1	Add experimental ggml-hexagon backend for the Hexagon NPU (#16547 ) * model: add support for extra bufs for all devices * hexagon: add experimental ggml-hexagon backend for the Hexagon NPU This commit introduces a new experimental backend `ggml-hexagon` with support for the Hexagon NPU. Highlights: - Supports Hexagon versions: v73, v75, v79, and v81 - Targets Android devices based on Snapdragon SoCs: Gen3, 8-Elite, and 8-Elite Gen5 - Supports Q4_0, Q8_0, MXFP4, and FP32 data types - Implements core LLM ops: MUL_MAT/MUL_MAT_ID, ADD/SUB/MUL/ADD_ID, RMS_NORM, ROPE, GLU/SWIGLU, SOFTMAX Note: This backend is experimental and may exhibit instability or limited performance across supported devices. It is intended for early testing and feedback from llama.cpp/ggml developer and user community. Co-Authored-By: Rajdeep Ganguly <rganguly@qti.qualcomm.com> Co-Authored-By: Todor Boinovski <todorb@qti.qualcomm.com> * hexagon: fix format checker errors * hexagon: update readme and cmake presets * ci: add android-ndk-build jobs that build plain ARM64 and Snapdragon versions * hexagon: add simple graph optimizer for stacking MUL_MAT ops with the same input * hexagon: move ADB helper scripts into scripts/snapdragon/adb * hexagon: replace all f/printfs with GGML_LOG_... * readme: add hexagon to the list supported backends * hexagon: stack malmuts with quantized inputs only * hexagon: add TODO for fixing issues in hexagon_graph_optimize * hexagon: update to hex-sdk 6.4.0 and add scripts for running on QDC * scripts: fix lint errors * scripts: update qdc pytest script to make linter happy * hexagon: add reduce sum in fp32 * hexagon: reduce number of vector stores in matmul output * hexagon: remove the need for vdelta in reduce-multiply-x8 * hexagon: consistent use of reduce_sum_fp32 for row_sums * hexagon: some more matmul optimizations and comments Optimize cases where tensor dims are not multiple of 1024 (e.g in Qwen models). We've handled those cases already but at a higher overhead. * hexagon: update cmake presets * hexagon: add OPMASK support for run-bench.sh wrapper * hexagon: update to use GGML_BACKEND_API * hexagon: remove unused logic for setting tensor flags for the views * hexagon: add asserts to set/get_tensor to make sure we handle complete tensors Same asserts as the CPU backend. * hexagon: use cpy_tensor slow path for non-host buffers * hexagon: error checks in the buffer allocator * cmake: move include(extProj) under ggml-hexagon * hexagon: don't forget to delete the backend on free * hexagon: set/get_tensor size assert apply only to quantized tensors * hexagon: reintroduce HEX_VERBOSE wrapper for GGML_LOG_DEBUG for now GGML_LOG_DEBUG is always enabled for test-backend-ops and the output gets in the way. Ideally we need a bit more finer log levels. * docs: typos in hexagon developer docs (libggm-...) * hexagon: overhaul error handling in the session/device allocation this should handle all failure paths in the session allocation. * hexagon: update cmake presets to enable fp16 vectors * hexagon: remove unused time_usec function * hexagon: don't forget to release buffer contexts * hexagon: fixed indents in hvx-utils (missed clang-format auto-format failure) * hexagon: remove custom can_repeat function and use ggml_can_repeat --------- Co-authored-by: Rajdeep Ganguly <rganguly@qti.qualcomm.com> Co-authored-by: Todor Boinovski <todorb@qti.qualcomm.com>	2025-10-22 13:47:09 -07:00
Sigbjørn Skjæret	84bf3c6778	model : add BailingMoeV2 support (#16063 ) * add BailingMoeV2 support * update llm types * undo * undo * update llm types * add model collection link * update * almost working * correct group selection and rename n_group_exp * avoid large top_k and use argmax instead for now if we had something like argmax2 that would be equivalent, but this works fine until then * poke * skip group selection when there are no tokens * fix 1T conversion * hopefully fixed expert group selection third time's the charm? * make expert group selection generally available The new LLaDA2Moe model uses this method too, make it generally available regardless of architecture. * allow n_expert_groups to be 1 (Kimi K2) * address review suggestions	2025-10-20 21:38:20 +02:00
Ron Evans	72d53e6c4d	readme: update bindings (#16651 ) Signed-off-by: deadprogram <ron@hybridgroup.com>	2025-10-20 11:20:04 +03:00
rtaluyev	27052978e4	readme : update bindings (#16144 ) Link to Java JNA bindings to llama.cpp native libraries	2025-09-25 18:20:34 +03:00
Aaron Teo	264f1b5187	zdnn: refactor codebase + add docs (#16178 ) * zdnn: initial matmul refactor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: rm static from funcs Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: update ggml-zdnn.h Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: change header files to hpp Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: switch to common.hpp Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: move mulmat forward around Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: rm inline from utils Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: code cleanup Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * docs: add zDNN docs Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> --------- Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-09-23 14:53:05 +08:00
Georgi Gerganov	5c6106a696	contrib : update roles (#16113 ) * contrib : update roles * contrib : merge PR sections + add link to CI instructions Updated pull request guidelines for contributors and collaborators, and clarified merging practices for maintainers.	2025-09-22 10:58:02 +03:00
Jie Fu (傅杰)	4795c91c32	docs : add Hunyuan to models section (#15707 ) Signed-off-by: Jie Fu <jiefu@tencent.com>	2025-09-01 10:34:59 +03:00
Tarek Dakhran	e288693669	readme : model : mtdm : lfm2 improvements (#15476 ) * Support untied embeddings * Increase number of image tokens to 1024 * Add LFM2-VL to readme * Actually use untied embeddings	2025-08-22 09:29:08 +02:00
Georgi Gerganov	3007baf201	readme : update hot topics (#15397 )	2025-08-18 18:11:44 +03:00
Georgi Gerganov	1a01899b61	readme : update hot topics (#15315 )	2025-08-14 17:16:03 +03:00
Zagaj	27093afe78	readme : update infra list (#15234 )	2025-08-11 15:27:54 +03:00
Georgi Gerganov	be42642581	readme : update hot topics (#15097 )	2025-08-05 20:19:33 +03:00
Radoslav Gerganov	2ba1333b35	docs : fix backends table in README.md (#14796 )	2025-07-21 14:03:49 +02:00
Aman Gupta	2be60cbc27	docs : fix link for tools/perplexity in README.md (#14780 )	2025-07-20 20:13:47 +02:00
Reese Levine	21c021745d	ggml: Add initial WebGPU backend (#14521 ) * Minimal setup of webgpu backend with dawn. Just prints out the adapter and segfaults * Initialize webgpu device * Making progress on setting up the backend * Finish more boilerplate/utility functions * Organize file and work on alloc buffer * Add webgpu_context to prepare for actually running some shaders * Work on memset and add shader loading * Work on memset polyfill * Implement set_tensor as webgpu WriteBuffer, remove host_buffer stubs since webgpu doesn't support it * Implement get_tensor and buffer_clear * Finish rest of setup * Start work on compute graph * Basic mat mul working * Work on emscripten build * Basic WebGPU backend instructions * Use EMSCRIPTEN flag * Work on passing ci, implement 4d tensor multiplication * Pass thread safety test * Implement permuting for mul_mat and cpy * minor cleanups * Address feedback * Remove division by type size in cpy op * Fix formatting and add github action workflows for vulkan and metal (m-series) webgpu backends * Fix name * Fix macos dawn prefix path	2025-07-16 18:18:51 +03:00
Tarek Dakhran	67eade1bf9	docs : add LFM2 to models section (#14650 ) * readme : add LFM2 to models section * fix copy paste...	2025-07-12 19:07:08 +02:00
Georgi Gerganov	aaa088d87f	readme : add hot PRs (#14636 ) * readme : add hot PRs * cont * readme : update title * readme : hot PRs links * cont	2025-07-11 16:07:55 +03:00
Georgi Gerganov	b7cc7745e3	readme : remove survey link (#14168 )	2025-06-13 11:55:44 +03:00
Georgi Gerganov	a681b4ba83	readme : remove project status link (#14149 )	2025-06-12 14:43:09 +03:00
Olexandr88	d01d112abb	readme : add badge (#13938 )	2025-06-05 10:50:55 +03:00
Xuan-Son Nguyen	ea1431b0fa	docs : add "Quick start" section for new users (#13862 ) * docs : add "Quick start" section for non-technical users * rm flox * Update README.md	2025-06-03 13:09:36 +02:00
ddh0	8726392d3d	readme : update bindings (#13950 )	2025-06-01 11:44:30 +03:00
Xuan-Son Nguyen	797990c4bc	mtmd : add ultravox audio input (#13623 ) * convert ok, load ok * warmup ok * test * still does not work? * fix padding * temporary give up * fix merge conflict * build_ultravox() * rm test * fix merge conflict * add necessary mtmd APIs * first working version (only 4s of audio) * will this monster compile? * fix compile * please compile * fPIC * fix windows * various fixes * clean up audio_helpers * fix conversion * add some debug stuff * long audio input ok * adapt the api * add --audio arg * final touch UX * add miniaudio to readme * fix typo * refactor kv metadata * mtmd_default_marker()	2025-05-22 20:42:48 +02:00
R0CKSTAR	33983057d0	musa: Upgrade MUSA SDK version to rc4.0.1 and use mudnn::Unary::IDENTITY op to accelerate D2D memory copy (#13647 ) * musa: fix build warning (unused parameter) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * musa: upgrade MUSA SDK version to rc4.0.1 Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * musa: use mudnn::Unary::IDENTITY op to accelerate D2D memory copy Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * Update ggml/src/ggml-cuda/cpy.cu Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * musa: remove MUDNN_CHECK_GEN and use CUDA_CHECK_GEN instead in MUDNN_CHECK Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2025-05-21 09:58:49 +08:00
Xuan-Son Nguyen	06c1e4abc1	readme : add list of dependencies and their license (#13591 )	2025-05-16 20:04:18 +02:00
Xuan-Son Nguyen	33eff40240	server : vision support via libmtmd (#12898 ) * server : (experimental) vision support via libmtmd * mtmd : add more api around mtmd_image_tokens * mtmd : add more api around mtmd_image_tokens * mtmd : ability to calc image hash * shared_ptr for mtmd_image_tokens * move hash to user-define ID (fixed) * abstract out the batch management * small fix * refactor logic adding tokens to batch * implement hashing image * use FNV hash, now hash bitmap instead of file data * allow decoding image embedding to be split into batches * rm whitespace * disable some features when mtmd is on * fix --no-mmproj-offload * mtmd_context_params no timings * refactor server_inp to server_tokens * fix the failing test case * init * wip * working version * add mtmd::bitmaps * add test target * rm redundant define * test: mtmd_input_chunks_free * rm outdated comment * fix merging issue * explicitly create mtmd::input_chunks * mtmd_input_chunk_copy * add clone() * improve server_input struct * clip : fix confused naming ffn_up and ffn_down * rm ffn_i/o/g naming * rename n_embd, n_ff * small fix * no check n_ff * fix detokenize * add const to various places * add warning about breaking changes * add c api * helper: use mtmd_image_tokens_get_n_pos * fix ctx_shift * fix name shadowing * more strict condition * support remote image_url * remote image_url log * add CI test * do not log base64 * add "has_multimodal" to /props * remove dangling image * speculative: use slot.cache_tokens.insert * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * rm can_be_detokenized * on prmpt processing done, assert cache_tokens.size * handle_completions_impl returns void * adapt the new web ui * update docs and hot topics * rm assert * small fix (2) --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-05-09 19:29:37 +02:00
Diego Devesa	1d36b3670b	llama : move end-user examples to tools directory (#13249 ) * llama : move end-user examples to tools directory --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-05-02 20:27:13 +02:00
Xuan-Son Nguyen	00e3e5a194	mtmd : add qwen2vl and qwen2.5vl (#13141 ) * llava : add clip_n_output_tokens, deprecate clip_n_patches * mtmd : add qwen2vl and qwen2.5vl * decode_embd_batch::set_position_... * working version * deprecate llama-qwen2vl-cli * correct order W, H of clip_embd_nbytes_by_img * edit existing line in hot topics	2025-04-29 11:47:04 +02:00
Georgi Gerganov	d0a417f3c7	readme : update hot topics (#13150 )	2025-04-28 12:10:18 +03:00
Xuan-Son Nguyen	84a9bf2fc2	mtmd : merge llava, gemma3 and minicpmv CLI into single `llama-mtmd-cli` (#13012 ) * mtmd : merge `llava-cli` and `gemma3-cli` into single `mtmd-cli` * support for minicpmv * remove cpp files of llava and minicpmv * update hot topics * mtmd : add not supported msg for qwen2vl * Update examples/llava/mtmd.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-04-21 15:32:58 +02:00
tastelikefeet	b2034c2b55	contrib: support modelscope community (#12664 ) * support download from modelscope * support login * remove comments * add arguments * fix code * fix win32 * test passed * fix readme * revert readme * change to MODEL_ENDPOINT * revert tail line * fix readme * refactor model endpoint * remove blank line * fix header * fix as comments * update comment * update readme --------- Co-authored-by: tastelikefeet <yuze.zyz@alibaba-inc/com>	2025-04-11 14:01:56 +02:00
Yuxuan Zhang	06bb53ad9b	llama-model : add Glm4Model implementation for GLM-4-0414 (#12867 ) * GLM-4-0414 * use original one * Using with tensor map * fix bug * change order * change order * format with flask8	2025-04-11 12:10:10 +02:00
Georgi Gerganov	47277d6d1d	readme : add rpc backend (#12842 )	2025-04-09 10:54:42 +03:00
Daniel Bevenius	348888e0dc	docs : add XCFramework section to README.md [no ci] (#12746 ) This commit adds a new section to the README.md file, detailing the usage of the XCFramework. The motivation for this is that it might not be immediately clear to users how to use the XCFramework in their projects and hopefully this will help.	2025-04-04 10:24:12 +02:00
Sigbjørn Skjæret	2c3f8b850a	llama : support BailingMoE (Ling) (#12634 )	2025-03-30 22:21:03 +02:00
Juyoung Suk	b3de7cac73	llama : add Trillion 7B model support (#12556 ) * Support Trillion 7B * Update llama.h * Update llama.h * Update llama-vocab.cpp for Trillion * Update llama-vocab.cpp	2025-03-30 20:38:33 +02:00
John Bean	89b2b56e86	readme: added Sidekick to available UIs (#12311 )	2025-03-10 16:13:09 +02:00
Lucas Moura Belo	3d652bfddf	readme : update bindings (#12229 )	2025-03-06 21:15:13 +02:00
Olivier Chafik	669912d9a5	`tool-call`: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars (#12034 ) * sampler: turn lazy grammar trigger words to regexes * add scripts/tool_bench.sh & .py * constrain llama json output regardless of function name if matches at beginning * update relaxed newline space rule in grammar tests * support add_generation_prompt query parameter (useful for /apply_template) * Update src/llama-grammar.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-03-05 13:05:13 +00:00
Georgi Gerganov	20a9b8f5e1	readme : fix roadmap link (#12185 )	2025-03-04 18:42:44 +02:00
Kante Yin	53e4db1012	readme : update infra list (#9096 ) Signed-off-by: kerthcet <kerthcet@gmail.com>	2025-02-26 09:49:36 +02:00
Georgi Gerganov	c2cd24fbfd	readme : add notice about new package registry (#11890 ) * readme : add notice about new package registry * cont : fix whitespace	2025-02-15 20:29:56 +02:00
Georgi Gerganov	68ff663a04	repo : update links to new url (#11886 ) * repo : update links to new url ggml-ci * cont : more urls ggml-ci	2025-02-15 16:40:57 +02:00
Georgi Gerganov	04045bb842	readme : minor	2025-02-14 00:16:56 +02:00
Daniel Bevenius	c48f630d1c	llama : add --completion-bash option (#11846 ) This commit adds a new option `--completion-bash` to the llama.cpp which outputs a source-able bash completion script. The motivation for this change is to provide a more user-friendly experience for users who use the command-line interface of llama.cpp. This is currently only basic and all options are displayed for all llama executables but this can be improved in the future if needed. Example usage: ```console $ build/bin/llama-cli --completion-bash > ~/.llama-completion.bash $ source ~/.llama-completion.bash $ ./build/bin/llama-server --m<TAB> --main-gpu --mirostat --mirostat-lr --model --multiline-input --min-p --mirostat-ent --mlock --model-url ```	2025-02-13 14:46:59 +01:00
lhez	4078c77f98	docs: add OpenCL (#11697 )	2025-02-11 15:04:13 -07:00

1 2 3 4 5 ...

475 Commits