llama.cpp

Commit Graph

Author	SHA1	Message	Date
Saba Fallah	aaf2fd17bb	minor: editconfig-check fix	2025-12-11 07:31:08 +01:00
Saba Fallah	ed944cd25b	fix: test-1.jpg ORC issue with small (640) resolution setting min-resolution base (1024) max large (1280) for dynamic-resolution	2025-12-10 20:20:55 +01:00
bluebread	016140699f	mtmd: remove tweak to llama-mtmd-cli & deepseek-ocr template	2025-12-09 16:31:44 +00:00
bluebread	5174a1e69a	mtmd: minor fix	2025-12-08 04:54:19 +00:00
bluebread	48c6cf2132	mtmd: convert model in FP16	2025-12-08 02:36:00 +00:00
bluebread	53273f83f8	mtmd: fixed wrong input setting	2025-12-07 23:58:22 +00:00
bluebread	5dfcc5abb1	mtmd: add detailed comments for resize_bicubic_pillow	2025-12-07 10:15:09 +00:00
bluebread	2d918b3e21	mtmd: make sam hparams configurable	2025-12-06 06:55:53 +00:00
bluebread	15f2ada0ed	mtmd: simplify get_rel_pos	2025-12-06 06:32:41 +00:00
Saba Fallah	705394c27a	minor editorconfig-check fixes	2025-12-05 13:27:52 +01:00
Saba Fallah	d981f19e9d	minor editorconfig-check fixes	2025-12-05 13:18:15 +01:00
Saba Fallah	1c88647ec6	fixed flake8 lint issues	2025-12-05 12:24:10 +01:00
Saba Fallah	5f2ee1aecf	Merge branch 'ggml-org:master' into sf/deepseek-ocr	2025-12-05 11:56:06 +01:00
Saba Fallah	6687b4e746	Merge pull request #9 from sfallah/sf/deepseek-ocr-attn using common build_attn in sam	2025-12-05 09:32:14 +01:00
Saba Fallah	f5bd310a5e	minor formatting and style	2025-12-05 09:30:58 +01:00
Johannes Gäßler	e95d0bc8fd	CUDA: fix FA VKQ accumulator overflow (#17746 )	2025-12-05 09:18:10 +01:00
Jiacheng (Jason) Chen	668ed76574	HIP: enable WMMA-MMQ INT kernels for RDNA 3 (#17576 ) * enabled wmma instructions for most quantizations other than q2k * fixed the last q2_k test case failure * address comments: fix out of bound write for RDNA4, add comments after #endif * clean up rebase: fix ne error in half2 * fix the EditorConfig CI	2025-12-05 09:17:37 +01:00
bluebread	d0c08e36a5	mtmd: minor fix	2025-12-05 04:03:56 +00:00
Sigbjørn Skjæret	03d9a77b85	ci : transform release binary root dir in tar to llama-bXXXX (#17773 ) * transform release binary root dir in tar to llama-bXXXX * bsdtar supports -s instead of --transform	2025-12-05 01:50:19 +01:00
Gabe Goodhart	3143a755c8	docs : update ops.md (Metal, BLAS) (#17768 ) * docs: Regen Metal.csv Branch: UpdateOpsMd Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * docs: Regen BLAS.csv Branch: UpdateOpsMd Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * docs: Update ops.md Branch: UpdateOpsMd Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> --------- Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>	2025-12-05 00:55:34 +01:00
Saba Fallah	076138a428	corrected code-branch when flash-attn disabled enabling usage of --flash-attn option	2025-12-04 23:45:59 +01:00
Saba Fallah	5381b9cf63	using common build_attn in sam	2025-12-04 23:13:29 +01:00
Saba Fallah	4d7d9945f8	Merge pull request #8 from sfallah/sf/deepseek-ocr-cleanup Sf/deepseek ocr cleanup	2025-12-04 22:20:39 +01:00
Piotr Wilkin (ilintar)	96fe9badfc	Add support for CUMSUM and TRI for CUDA. (#17584 ) * Add support for CUMSUM and TRI for CUDA. * Minor optimizations. * Correct warp_prefix_inclusive_sum in float2 variant to return float2 * Optimize TRI * Whitespace * Fix strides. * Implement double loop * Whitespace * Fix HIP compilation bugs * Optimizations + big case performance tests * Implement using CUB with fallback to custom kernel * Remove error message. * Fixes from code review * Comment out CPU-unsupported F16/BF16 cases to fix CI * Fine, you win :P * Fix last cast, use NO_DEVICE_CODE and GGML_UNUSED_VARS * Vary warp-size based on physical warp size * Add GGML_UNUSED_VARS in tri as well * Use constexpr and call prefix_inclusive with warp_size template param * Update ggml/src/ggml-cuda/cumsum.cu Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * Apply suggestions from code review Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * Change to tid % warp_size * Fix strides; hardcode mask; add ggml_lane_mask_t * Missing renames, remove unused get_warp_mask(), explicit calls to ggml_cuda_info() * Too hasty... --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2025-12-04 22:19:51 +01:00
bluebread	fc3f625fef	mtmd: support combined QKV projection in buid_vit	2025-12-04 17:57:43 +00:00
Gabe Goodhart	bde188d60f	metal: TRI, FILL, EXPM1, SOFTPLUS (#16623 ) * feat(wip): Port initial TRI impl from pervious work The kernel does not work and is not optimized, but the code compiles and runs, so this will be the starting point now that the core op has been merged. Branch: ggml-cumsum-tri Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix: Remove argument for constant val override This was added in the original draft, but later removed. With this, the kernel now passes tests. Branch: ggml-cumsum-tri Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: Move the ttype conditional to templating to avoid conditional in kernel Branch: ggml-cumsum-tri Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix: Type fixes Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * feat: Add softplus for metal Branch: ggml-cumsum-tri Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: Add EXPM1 for metal Branch: ggml-cumsum-tri Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: Add FILL for metal Branch: ggml-cumsum-tri Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * refactor: Branchless version of tri using _ggml_vec_tri_cmp as a mask Branch: ggml-cumsum-tri Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix: Remove unused arguments Branch: ggml-cumsum-tri Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * refactor: Use select instead of branch for softplus non-vec Branch: ggml-cumsum-tri Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> --------- Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-12-04 19:12:19 +02:00
bluebread	2dd9924076	Merge branch 'sf/deepseek-ocr-cleanup' of github.com:sfallah/llama.cpp into sf/deepseek-ocr-cleanup	2025-12-04 16:52:00 +00:00
bluebread	c89171cf4d	mtmd: fixed bad ocr check in Deepseek2 (LM)	2025-12-04 16:50:05 +00:00
Xuan-Son Nguyen	9d0229967a	server: strip content-length header on proxy (#17734 )	2025-12-04 16:32:57 +01:00
Saba Fallah	0399ddf145	reverting automatically removed spaces	2025-12-04 16:16:59 +01:00
Saba Fallah	a661c52990	reverting automatically removed spaces	2025-12-04 16:12:41 +01:00
Xuan-Son Nguyen	c4c10bfb86	server: move msg diffs tracking to HTTP thread (#17740 ) * server: move msg diffs tracking to HTTP thread * wip * tool call tests ok * minor : style * cont : fix * move states to server_response_reader * add safe-guard * fix * fix 2 --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-12-04 15:46:08 +01:00
Saba Fallah	c73748ab5d	Merge branch 'sf/deepseek-ocr' into sf/deepseek-ocr-cleanup # Conflicts: # gguf-py/gguf/tensor_mapping.py	2025-12-04 15:09:32 +01:00
Saba Fallah	386ba479a2	clean up	2025-12-04 15:05:58 +01:00
bluebread	7451b84105	mtmd: fix tensor names for image newlines and view separator	2025-12-04 13:26:53 +00:00
Daniel Bevenius	817d743cc1	examples : add missing code block end marker [no ci] (#17756 ) This commit adds the missing code block end marker in simple-cmake-pkg to correct the formatting.	2025-12-04 14:17:30 +01:00
Daniel Bevenius	bd4ef13476	common : skip model validation when --help is requested (#17755 ) This commit skips the model validation check when the user specifies the --help option. The motivation for this is that currently and error is thrown before the --help could be processed. Now skips validation if params.usage is set, allowing help to display without requiring --model. Resolves: https://github.com/ggml-org/llama.cpp/issues/17754	2025-12-04 13:36:50 +01:00
Alberto Cabrera Pérez	87a2084c45	ggml-cpu : remove asserts always evaluating to false (#17728 )	2025-12-04 13:16:38 +01:00
SmartestWashingMachine	3659aa28e9	convert: use existing local chat_template if mistral-format model has one. (#17749 ) * conversion: use existing local chat_template.jinja file if mistral-format model has one. * fix --mistral-format mistakenly assuming some <=v7 chat template names are file paths and reading them. * Update convert_hf_to_gguf.py - change from exists() to is_file() Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-12-04 12:12:45 +01:00
Adrien Gallouët	2a73f81f8a	cmake : simplify build info detection using standard variables (#17423 ) The current approach has several drawbacks. Mostly, when cross-compiling, invoking the compiler binary directly to query the machine hardware can behave unexpectedly depending on the toolchain wrapper (using COMPILER_TARGET, CFLAGS, etc). As CMake is the official tool to build llama.cpp, I propose to only rely on it to get those variables (`CMAKE_SYSTEM_NAME` and `CMAKE_SYSTEM_PROCESSOR`). Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2025-12-04 12:42:13 +02:00
Sigbjørn Skjæret	7dba049b07	ci : disable ggml-ci-x64-amd-* (#17753 )	2025-12-04 11:25:08 +01:00
Adrien Gallouët	83c1171529	common: use native MultiByteToWideChar (#17738 ) `std::codecvt_utf8<wchar_t>` is deprecated and produces warnings: common/common.cpp:792:31: warning: 'codecvt_utf8<wchar_t>' is deprecated [-Wdeprecated-declarations] 792 \| std::wstring_convert<std::codecvt_utf8<wchar_t>> converter; \| Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2025-12-04 12:06:49 +02:00
Georgi Gerganov	0d1324856f	metal : use params per pipeline instance (#17739 )	2025-12-04 10:34:11 +02:00
Georgi Gerganov	a67ef0f47f	llama : fix sanity checks during quantization (#17721 )	2025-12-04 10:33:42 +02:00
Adrien Gallouët	ef75a89fdb	build : move _WIN32_WINNT definition to headers (#17736 ) Previously, cmake was forcing `_WIN32_WINNT=0x0A00` for MinGW builds, This caused "macro redefined" warnings with toolchains that define the version. This also removes the `GGML_WIN_VER` variable as it is no longer needed. Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2025-12-04 07:04:02 +01:00
Jeff Bolz	d8b5cdc4fe	build: enable parallel builds in msbuild using MTT (#17708 ) * build: enable parallel builds in msbuild using MTT * check LLAMA_STANDALONE	2025-12-03 22:42:29 -06:00
Herman Semenoff	dea9ba27cb	ggml-cpu: remove duplicate conditional check 'iid' (#17650 )	2025-12-04 05:03:19 +08:00
Piotr Wilkin (ilintar)	c6d1a00aa7	Add a couple of file types to the text section (#17670 ) * Add a couple of file types to the text section * Format + regenerate index * Rebuild after rebase	2025-12-03 21:45:06 +01:00
SmartestWashingMachine	424c579455	convert : support latest mistral-common (fix conversion with --mistral-format) (#17712 ) * fix convert_hf_to_gguf.py failing with --mistral-format using later mistral-common versions. * use get_one_valid_tokenizer_file from mistral-common if available and fallback to old logic otherwise. * use file name instead of file path for get_one_valid_tokenizer_file. * fix --mistral-format tokenizer file failing for tokenizers in subdirectories. * move get_one_valid_tokenizer_file import to avoid nested try-except.	2025-12-03 21:15:04 +01:00
Aleksander Grygier	e9f9483464	Use OpenAI-compatible `/v1/models` endpoint by default (#17689 ) * refactor: Data fetching via stores * chore: update webui build output * refactor: Use OpenAI compat `/v1/models` endpoint by default to list models * chore: update webui build output * chore: update webui build output	2025-12-03 20:49:09 +01:00

1 2 3 4 5 ...

7364 Commits All Branches Search

7364 Commits

All Branches