Commit Graph

7364 Commits

Author SHA1 Message Date
Saba Fallah aaf2fd17bb minor: editconfig-check fix 2025-12-11 07:31:08 +01:00
Saba Fallah ed944cd25b fix: test-1.jpg ORC issue with small (640) resolution
setting min-resolution base (1024) max large (1280) for dynamic-resolution
2025-12-10 20:20:55 +01:00
bluebread 016140699f mtmd: remove tweak to llama-mtmd-cli & deepseek-ocr template 2025-12-09 16:31:44 +00:00
bluebread 5174a1e69a mtmd: minor fix 2025-12-08 04:54:19 +00:00
bluebread 48c6cf2132 mtmd: convert model in FP16 2025-12-08 02:36:00 +00:00
bluebread 53273f83f8 mtmd: fixed wrong input setting 2025-12-07 23:58:22 +00:00
bluebread 5dfcc5abb1 mtmd: add detailed comments for resize_bicubic_pillow 2025-12-07 10:15:09 +00:00
bluebread 2d918b3e21 mtmd: make sam hparams configurable 2025-12-06 06:55:53 +00:00
bluebread 15f2ada0ed mtmd: simplify get_rel_pos 2025-12-06 06:32:41 +00:00
Saba Fallah 705394c27a minor editorconfig-check fixes 2025-12-05 13:27:52 +01:00
Saba Fallah d981f19e9d minor editorconfig-check fixes 2025-12-05 13:18:15 +01:00
Saba Fallah 1c88647ec6 fixed flake8 lint issues 2025-12-05 12:24:10 +01:00
Saba Fallah 5f2ee1aecf
Merge branch 'ggml-org:master' into sf/deepseek-ocr 2025-12-05 11:56:06 +01:00
Saba Fallah 6687b4e746
Merge pull request #9 from sfallah/sf/deepseek-ocr-attn
using common build_attn in sam
2025-12-05 09:32:14 +01:00
Saba Fallah f5bd310a5e minor formatting and style 2025-12-05 09:30:58 +01:00
Johannes Gäßler e95d0bc8fd
CUDA: fix FA VKQ accumulator overflow (#17746) 2025-12-05 09:18:10 +01:00
Jiacheng (Jason) Chen 668ed76574
HIP: enable WMMA-MMQ INT kernels for RDNA 3 (#17576)
* enabled wmma instructions for most quantizations other than q2k

* fixed the last q2_k test case failure

* address comments: fix out of bound write for RDNA4, add comments after #endif

* clean up rebase: fix ne error in half2

* fix the EditorConfig CI
2025-12-05 09:17:37 +01:00
bluebread d0c08e36a5 mtmd: minor fix 2025-12-05 04:03:56 +00:00
Sigbjørn Skjæret 03d9a77b85
ci : transform release binary root dir in tar to llama-bXXXX (#17773)
* transform release binary root dir in tar to llama-bXXXX

* bsdtar supports -s instead of --transform
2025-12-05 01:50:19 +01:00
Gabe Goodhart 3143a755c8
docs : update ops.md (Metal, BLAS) (#17768)
* docs: Regen Metal.csv

Branch: UpdateOpsMd

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* docs: Regen BLAS.csv

Branch: UpdateOpsMd

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* docs: Update ops.md

Branch: UpdateOpsMd

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

---------

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-12-05 00:55:34 +01:00
Saba Fallah 076138a428 corrected code-branch when flash-attn disabled
enabling usage of --flash-attn option
2025-12-04 23:45:59 +01:00
Saba Fallah 5381b9cf63 using common build_attn in sam 2025-12-04 23:13:29 +01:00
Saba Fallah 4d7d9945f8
Merge pull request #8 from sfallah/sf/deepseek-ocr-cleanup
Sf/deepseek ocr cleanup
2025-12-04 22:20:39 +01:00
Piotr Wilkin (ilintar) 96fe9badfc
Add support for CUMSUM and TRI for CUDA. (#17584)
* Add support for CUMSUM and TRI for CUDA.

* Minor optimizations.

* Correct warp_prefix_inclusive_sum in float2 variant to return float2

* Optimize TRI

* Whitespace

* Fix strides.

* Implement double loop

* Whitespace

* Fix HIP compilation bugs

* Optimizations + big case performance tests

* Implement using CUB with fallback to custom kernel

* Remove error message.

* Fixes from code review

* Comment out CPU-unsupported F16/BF16 cases to fix CI

* Fine, you win :P

* Fix last cast, use NO_DEVICE_CODE and GGML_UNUSED_VARS

* Vary warp-size based on physical warp size

* Add GGML_UNUSED_VARS in tri as well

* Use constexpr and call prefix_inclusive with warp_size template param

* Update ggml/src/ggml-cuda/cumsum.cu

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* Apply suggestions from code review

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* Change to tid % warp_size

* Fix strides; hardcode mask; add ggml_lane_mask_t

* Missing renames, remove unused get_warp_mask(), explicit calls to ggml_cuda_info()

* Too hasty...

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2025-12-04 22:19:51 +01:00
bluebread fc3f625fef mtmd: support combined QKV projection in buid_vit 2025-12-04 17:57:43 +00:00
Gabe Goodhart bde188d60f
metal: TRI, FILL, EXPM1, SOFTPLUS (#16623)
* feat(wip): Port initial TRI impl from pervious work

The kernel does not work and is not optimized, but the
code compiles and runs, so this will be the starting point
now that the core op has been merged.

Branch: ggml-cumsum-tri

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix: Remove argument for constant val override

This was added in the original draft, but later removed. With this, the
kernel now passes tests.

Branch: ggml-cumsum-tri

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* feat: Move the ttype conditional to templating to avoid conditional in kernel

Branch: ggml-cumsum-tri

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix: Type fixes

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* feat: Add softplus for metal

Branch: ggml-cumsum-tri

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* feat: Add EXPM1 for metal

Branch: ggml-cumsum-tri

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* feat: Add FILL for metal

Branch: ggml-cumsum-tri

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* refactor: Branchless version of tri using _ggml_vec_tri_cmp as a mask

Branch: ggml-cumsum-tri

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix: Remove unused arguments

Branch: ggml-cumsum-tri

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* refactor: Use select instead of branch for softplus non-vec

Branch: ggml-cumsum-tri

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

---------

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-12-04 19:12:19 +02:00
bluebread 2dd9924076 Merge branch 'sf/deepseek-ocr-cleanup' of github.com:sfallah/llama.cpp into sf/deepseek-ocr-cleanup 2025-12-04 16:52:00 +00:00
bluebread c89171cf4d mtmd: fixed bad ocr check in Deepseek2 (LM) 2025-12-04 16:50:05 +00:00
Xuan-Son Nguyen 9d0229967a
server: strip content-length header on proxy (#17734) 2025-12-04 16:32:57 +01:00
Saba Fallah 0399ddf145 reverting automatically removed spaces 2025-12-04 16:16:59 +01:00
Saba Fallah a661c52990 reverting automatically removed spaces 2025-12-04 16:12:41 +01:00
Xuan-Son Nguyen c4c10bfb86
server: move msg diffs tracking to HTTP thread (#17740)
* server: move msg diffs tracking to HTTP thread

* wip

* tool call tests ok

* minor : style

* cont : fix

* move states to server_response_reader

* add safe-guard

* fix

* fix 2

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-12-04 15:46:08 +01:00
Saba Fallah c73748ab5d Merge branch 'sf/deepseek-ocr' into sf/deepseek-ocr-cleanup
# Conflicts:
#	gguf-py/gguf/tensor_mapping.py
2025-12-04 15:09:32 +01:00
Saba Fallah 386ba479a2 clean up 2025-12-04 15:05:58 +01:00
bluebread 7451b84105 mtmd: fix tensor names for image newlines and view separator 2025-12-04 13:26:53 +00:00
Daniel Bevenius 817d743cc1
examples : add missing code block end marker [no ci] (#17756)
This commit adds the missing code block end marker in simple-cmake-pkg
to correct the formatting.
2025-12-04 14:17:30 +01:00
Daniel Bevenius bd4ef13476
common : skip model validation when --help is requested (#17755)
This commit skips the model validation check when the user specifies the
--help option.

The motivation for this is that currently and error is thrown before the
--help could be processed. Now skips validation if params.usage is set,
allowing help to display without requiring --model.

Resolves: https://github.com/ggml-org/llama.cpp/issues/17754
2025-12-04 13:36:50 +01:00
Alberto Cabrera Pérez 87a2084c45
ggml-cpu : remove asserts always evaluating to false (#17728) 2025-12-04 13:16:38 +01:00
SmartestWashingMachine 3659aa28e9
convert: use existing local chat_template if mistral-format model has one. (#17749)
* conversion: use existing local chat_template.jinja file if mistral-format model has one.

* fix --mistral-format mistakenly assuming some <=v7 chat template names are file paths and reading them.

* Update convert_hf_to_gguf.py - change from exists() to is_file()

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-12-04 12:12:45 +01:00
Adrien Gallouët 2a73f81f8a
cmake : simplify build info detection using standard variables (#17423)
The current approach has several drawbacks. Mostly, when
cross-compiling, invoking the compiler binary directly to query the
machine hardware can behave unexpectedly depending on the toolchain
wrapper (using COMPILER_TARGET, CFLAGS, etc).

As CMake is the official tool to build llama.cpp, I propose to only rely
on it to get those variables (`CMAKE_SYSTEM_NAME` and
`CMAKE_SYSTEM_PROCESSOR`).

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-12-04 12:42:13 +02:00
Sigbjørn Skjæret 7dba049b07
ci : disable ggml-ci-x64-amd-* (#17753) 2025-12-04 11:25:08 +01:00
Adrien Gallouët 83c1171529
common: use native MultiByteToWideChar (#17738)
`std::codecvt_utf8<wchar_t>` is deprecated and produces warnings:

    common/common.cpp:792:31: warning: 'codecvt_utf8<wchar_t>' is deprecated [-Wdeprecated-declarations]
      792 |     std::wstring_convert<std::codecvt_utf8<wchar_t>> converter;
          |

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-12-04 12:06:49 +02:00
Georgi Gerganov 0d1324856f
metal : use params per pipeline instance (#17739) 2025-12-04 10:34:11 +02:00
Georgi Gerganov a67ef0f47f
llama : fix sanity checks during quantization (#17721) 2025-12-04 10:33:42 +02:00
Adrien Gallouët ef75a89fdb
build : move _WIN32_WINNT definition to headers (#17736)
Previously, cmake was forcing `_WIN32_WINNT=0x0A00` for MinGW builds,
This caused "macro redefined" warnings with toolchains that define the version.

This also removes the `GGML_WIN_VER` variable as it is no longer needed.

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-12-04 07:04:02 +01:00
Jeff Bolz d8b5cdc4fe
build: enable parallel builds in msbuild using MTT (#17708)
* build: enable parallel builds in msbuild using MTT

* check LLAMA_STANDALONE
2025-12-03 22:42:29 -06:00
Herman Semenoff dea9ba27cb
ggml-cpu: remove duplicate conditional check 'iid' (#17650) 2025-12-04 05:03:19 +08:00
Piotr Wilkin (ilintar) c6d1a00aa7
Add a couple of file types to the text section (#17670)
* Add a couple of file types to the text section

* Format + regenerate index

* Rebuild after rebase
2025-12-03 21:45:06 +01:00
SmartestWashingMachine 424c579455
convert : support latest mistral-common (fix conversion with --mistral-format) (#17712)
* fix convert_hf_to_gguf.py failing with --mistral-format using later mistral-common versions.

* use get_one_valid_tokenizer_file from mistral-common if available and fallback to old logic otherwise.

* use file name instead of file path for get_one_valid_tokenizer_file.

* fix --mistral-format tokenizer file failing for tokenizers in subdirectories.

* move get_one_valid_tokenizer_file import to avoid nested try-except.
2025-12-03 21:15:04 +01:00
Aleksander Grygier e9f9483464
Use OpenAI-compatible `/v1/models` endpoint by default (#17689)
* refactor: Data fetching via stores

* chore: update webui build output

* refactor: Use OpenAI compat `/v1/models` endpoint by default to list models

* chore: update webui build output

* chore: update webui build output
2025-12-03 20:49:09 +01:00