llama.cpp

Commit Graph

Author	SHA1	Message	Date
Sigbjørn Skjæret	65097181e4	fix **/x glob matching (#21129 )	2026-03-28 22:27:38 +01:00
Piotr Wilkin (ilintar)	98ae0a0d36	common/parser: fix handling of tool definition with missing properties key (#21128 )	2026-03-28 20:41:32 +01:00
Sigbjørn Skjæret	3a14a542f5	common : add character class support to glob_match (#21111 ) * add character class support to glob_match * remove pointless reference	2026-03-28 19:57:37 +01:00
BlueMöhre	968189729f	WebUI: Replace illegal nested button elements (#21026 ) * remove/replace nested button elements * map rest props to outer element * solve TODO * chore: update webui build output	2026-03-28 17:57:59 +01:00
Adrien	e397d3885c	common/json-schema: fix: handle non-capturing groups (?:...) in JSON schema pattern converter (#21124 ) The regex-to-grammar converter in _visit_pattern() crashes with SIGSEGV when a JSON schema "pattern" field contains a non-capturing group (?:...). Root cause: when the parser sees '(' followed by '?', it pushes a warning but does not advance past '?:'. The recursive transform() call then interprets '?' as a quantifier and calls seq.back() on an empty vector, causing undefined behavior. This commonly occurs when serving OpenAI-compatible tool calls from clients that include complex regex patterns in their JSON schemas (e.g., date validation patterns like ^(?:(?:\d\d[2468][048]\|...)-02-29\|...)$). The fix: - Skip '?:' after '(' to treat non-capturing groups as regular groups - For unsupported syntax (?=, ?!, etc.), skip to matching ')' safely, handling escaped characters to avoid miscounting parenthesis depth - Adjust the ')' unbalanced-parentheses check using direct char comparisons instead of substr - Add test cases for non-capturing groups (C++ only, as the JS/Python implementations do not yet support this syntax)	2026-03-28 17:55:38 +01:00
Aldehir Rojas	e6f2ec01ff	common : add reasoning_format = none support to gpt-oss (#21094 )	2026-03-28 09:33:39 -05:00
Georgi Gerganov	edfb440a2f	server : fix processing of multiple back-to-back mtmd chunks (#21107 )	2026-03-28 16:27:36 +02:00
Adrien Gallouët	3d66da1809	ci : gracefully shut down the server (#21110 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-03-28 14:49:57 +01:00
Woof Dog	82b703f8bc	Document custom default webui preferences in server README (#19771 )	2026-03-28 14:19:16 +01:00
Aleksander Grygier	51a84efc53	webui: Conversation forking + branching improvements (#21021 ) * refactor: Make `DialogConfirmation` extensible with children slot * feat: Add conversation forking logic * feat: Conversation forking UI * feat: Update delete/edit dialogs and logic for forks * refactor: Improve Chat Sidebar UX and add MCP Servers entry * refactor: Cleanup * feat: Update message in place when editing leaf nodes * chore: Cleanup * chore: Cleanup * chore: Cleanup * chore: Cleanup * chore: Cleanup * chore: Cleanup * refactor: Post-review improvements * chore: update webui build output * test: Update Storybook test * chore: update webui build output * chore: update webui build output	2026-03-28 13:38:15 +01:00
Adrien Gallouët	b0f0dd3e51	vendor : update cpp-httplib to 0.40.0 (#21100 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-03-28 08:59:44 +01:00
Ruben Ortlam	0eb4764182	vulkan: add noncontiguous GLU support (#21081 ) * vulkan: add noncontiguous GLU support * fix compile issue	2026-03-28 08:44:56 +01:00
Piotr Wilkin (ilintar)	1f5d15e665	common/parser: fix reasoning whitespace bugs + extra parser tests (#21085 ) * fix whitespace reasoning issues + add reconstruction tests * Proper fix * fix Nemotron autoparser test expectations to include newline in marker	2026-03-28 07:29:26 +01:00
Sigbjørn Skjæret	c46758d28f	cli : add /glob command (#21084 ) * add /glob command * output error when max files reached * support globbing outside curdir	2026-03-28 02:33:04 +01:00
Ts-sound	bf934f28db	docker : fix and enable ARM64 image build (#20929 ) * CI: fix ARM64 image build error & enable compilation * Update .github/workflows/docker.yml Co-authored-by: Aaron Teo <taronaeo@gmail.com> * CI: revert ggml/src/ggml-cpu/CMakeLists.txt * Update .github/workflows/docker.yml Co-authored-by: Aaron Teo <taronaeo@gmail.com> * CI: update runs-on to ubuntu24.04, and update ARM64 build image ( ubuntu_version: "24.04") * CI: change cpu.Dockerfile gcc to 14; * CI : cpu.Dockerfile , update pip install . * Update .github/workflows/docker.yml Co-authored-by: Aaron Teo <taronaeo@gmail.com> --------- Co-authored-by: Aaron Teo <taronaeo@gmail.com>	2026-03-28 01:45:09 +01:00
Adrien Gallouët	5c1a7b8355	server : add custom socket options to disable SO_REUSEPORT (#21056 ) * server : add custom socket options to disable SO_REUSEPORT Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Add --reuse-port $ strace -e trace=setsockopt,bind build/bin/llama-server -lv 2 --reuse-port setsockopt(3, SOL_TCP, TCP_NODELAY, [1], 4) = 0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 setsockopt(3, SOL_SOCKET, SO_REUSEPORT, [1], 4) = 0 bind(3, {sa_family=AF_INET, sin_port=htons(8080), sin_addr=inet_addr("127.0.0.1")}, 16) = 0 $ strace -e trace=setsockopt,bind build/bin/llama-server -lv 2 setsockopt(3, SOL_TCP, TCP_NODELAY, [1], 4) = 0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 bind(3, {sa_family=AF_INET, sin_port=htons(8080), sin_addr=inet_addr("127.0.0.1")}, 16) = 0 Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Update tools/server/README.md (llama-gen-docs) Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Fix windows Signed-off-by: Adrien Gallouët <angt@huggingface.co> --------- Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-03-28 01:12:43 +01:00
Aldehir Rojas	59d840209a	common : inhibit lazy grammar sampler while reasoning is active (#20970 ) * common : inhibit grammar while reasoning budget is active * cont : update force_pos in accept * cont : fix tests * cont : tweak should apply logic * cont : return early not using grammar sampler * Add tests * cont : prevent backend sampling when reasoning budget enabled * cont : fix typo --------- Co-authored-by: Piotr Wilkin <piotr.wilkin@syndatis.com>	2026-03-27 18:30:40 +01:00
Kusha Gharahi	ff934e29bc	server: Introduce LLAMA_BUILD_WEBUI build flag to allow disabling the embedded web ui (#20158 ) * introduce LLAMA_SERVER_NO_WEBUI * LLAMA_SERVER_NO_WEBUI → LLAMA_BUILD_WEBUI * LLAMA_BUILD_WEBUI ON by default not based on LLAMA_STANDALONE * MIssed this * Add useWebUi to package.nix	2026-03-27 17:25:55 +01:00
Yiwei Shao	ee051c1e4e	hexagon: support for IQ4_NL and MXFP4 (#21018 ) * ggml-hexagon: add IQ4_NL and MXFP4 HMX matmul support - Add IQ4_NL quantization type support to Hexagon backend (buffer set/get tensor repack, mul_mat, mul_mat_id dispatch) - Implement HVX IQ4_NL vec_dot kernels (1x1, 2x1, 2x2) with LUT-based 4-bit index to int8 kvalue dequantization - Add MXFP4 HMX dequantization path with E8M0 scale conversion, including batch-4 fast path and single-tile fallback - Unify quantized row size / scale offset logic to handle Q4_0, Q8_0, IQ4_NL, and MXFP4 in the DMA fetch path * ggml-hexagon: fix SKIP_QUANTIZE src1 address mismatch in mixed-quant models * Fix the pragma indent	2026-03-27 09:22:41 -07:00
Aleksander Grygier	e6f6770515	webui: Improve Chat Messages initial scroll + auto-scroll logic + add lazy loading with transitions to content blocks (#20999 ) * refactor: Always use agentic content renderer for Assistant Message * feat: Improve initial scroll + auto-scroll logic + implement fade in action for content blocks * chore: update webui build output	2026-03-27 17:01:36 +01:00
AN Long	48cda24c11	server: remove the verbose_prompt parameter (#21059 ) * server: respect the verbose_prompt parameter * Revert "server: respect the verbose_prompt parameter" This reverts commit `8ed885cf37`. * Remove --verbose-prompt parameter from llama-server * Using set_examples instead of set_excludes	2026-03-27 13:36:13 +02:00
Xuan-Son Nguyen	871f1a2d2f	mtmd: add more sanity checks (#21047 )	2026-03-27 11:00:52 +01:00
Xuan-Son Nguyen	20197b6fe3	server: add built-in tools backend support (#20898 ) * wip: server_tools * refactor * displayName -> display_name * snake_case everywhere * rm redundant field * change arg to --tools all * add readme mention * llama-gen-docs	2026-03-27 10:07:11 +01:00
Radoslav Gerganov	ba38f3becc	rpc : proper handling of data pointers to CPU buffers (#21030 ) The compute graph may contain tensors pointing to CPU buffers. In these cases the buffer address is serialized as 0 and sent over the wire. However, the data pointer is serialized as-is and this prevents proper validation on the server side. This patches fixes this by serializing the data pointer as 0 for non-RPC buffers and doing proper validation on the server side. closes: #21006	2026-03-27 10:59:35 +02:00
mtmcp	37f230dd7c	completion : session_tokens insert range in completion tool (no-op → correct) (#20917 ) The embd.begin(), embd.begin() range is empty and inserts nothing, so session_tokens never gets updated after decoding. Should be embd.begin(), embd.end(). Introduced in commit `2b6dfe8`.	2026-03-27 09:25:58 +01:00
mtmcp	a308e584ca	completion : Fix segfault on model load failure (#21049 )	2026-03-27 10:01:13 +02:00
Pascal	d0fa2c9fbb	Send reasoning content back to the model across turns via the reasoning_content API field (#21036 ) * webui: send reasoning_content back to model in context Preserve assistant reasoning across turns by extracting it from internal tags and sending it as a separate reasoning_content field in the API payload. The server and Jinja templates handle native formatting (e.g. <think> tags for Qwen, GLM, DeepSeek...). Adds "Exclude reasoning from context" toggle in Settings > Developer (off by default, so reasoning is preserved). Includes unit tests. * webui: add syncable parameter for excludeReasoningFromContext * chore: update webui build output	2026-03-27 08:17:35 +01:00
ren	9bcb4eff4d	metal : Fix dimension constraint violation in matmul2d descriptor (#21048 ) Updates Metal tensor API test probe to fix the dimension constraint violation in the matmul2d descriptor (at least one value must be a multiple of 16).	2026-03-27 09:05:21 +02:00
KokerZhou	6861f6509a	CANN: update docker images to 8.5.0 and improve CANN.md (#20801 ) * cann: update docker images to 8.5.0 - bump CANN base image from 8.3.rc2 to 8.5.0 - bump ASCEND_VERSION from 8.1.RC1.alpha001 to 8.5.0 Move to newer stable releases. * cann: update CANN.md * Update CANN.md to include BF16 support Added BF16 support information to the CANN documentation and corrected formatting for the installation instructions. * Fix formatting issues in CANN.md Fix 234: Trailing whitespace	2026-03-27 08:53:00 +08:00
Saba Fallah	1743d98057	mtmd: fix "v.patch_embd" quant and unsupported im2col ops on Metal for deepseek-ocr (#21027 ) * mtmd: fix "v.patch_embd" quant and unsupported im2col ops on Metal for deepseek-ocr * Update src/llama-quant.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-03-27 00:07:55 +01:00
uvos	7ca0c9cca7	hip: use fnuz fp8 for conversion on CDNA3 (#21040 )	2026-03-26 23:06:33 +01:00
Xuan-Son Nguyen	8c60b8a2be	ci: pin external actions to exact commit SHA (#21033 )	2026-03-26 20:44:00 +01:00
Adrien Gallouët	287b5b1eab	common : add getpwuid fallback for HF cache when HOME is not set (#21035 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-03-26 20:34:23 +01:00
Xuan-Son Nguyen	a73bbd5d92	mtmd: refactor image preprocessing (#21031 ) * mtmd: refactor image pre-processing * correct some places * correct lfm2 * fix deepseek-ocr on server * add comment to clarify about mtmd_image_preprocessor_dyn_size	2026-03-26 19:49:20 +01:00
lhez	ded446b34c	opencl: allow large buffer for adreno (#20997 )	2026-03-26 08:52:21 -07:00
Michael Wand	f8d4abae86	convert : support Qwen3.5/Qwen3.5 Moe NVFP4 and add input scales (#20505 ) * convert : fix Qwen3.5 NVFP4 conversion * Updated copilot concerns and rebased * move into _LinearAttentionVReorderBase and simplify * --flake * new_name not needed * Added input_scale to gguf * Fixed input_scale addition as tensor * Added input scale to loader and named _in_s * Update convert_hf_to_gguf.py Re-removed input_scale from aux cleanup Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-03-26 16:52:06 +01:00
Pavel Zloi	3d5acab3e7	convert : add RuGPT3XL (RuGPT3XLForCausalLM) support (#21011 ) * Support of ruGPT3XL model added * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * chkhsh for ruGPT3XL model added * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Fixing chkhsh for ruGPT3XL, rerun updated and _qkv_parts in RuGPT3XLModel --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-03-26 16:49:09 +01:00
Adrien Gallouët	9900b29c3a	common : filter out imatrix when finding models (#21023 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-03-26 15:37:18 +01:00
ihb2032	dc8d14c582	fix(ggml): correct RISC-V ISA string canonical ordering for RVV in CMake (#20888 ) Signed-off-by: ihb2032 <hebome@foxmail.com>	2026-03-26 13:08:41 +02:00
Adrien Gallouët	93dfbc1291	common : make LLAMA_CACHE the one cache for everything (#21009 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-03-26 12:04:57 +01:00
Adrien Gallouët	3cba8bba18	common : fix split model migration (#21019 ) Sadly the manifest does not list all required files, i honestly thought it was the case Without the files listed we don't have the sha256, so if the first file is valid, and all others have the correct size, then we can assume we are good and do the migration... Here my test: $ find /home/angt/.cache/llama.cpp /home/angt/.cache/llama.cpp /home/angt/.cache/llama.cpp/angt_test-split-model-stories260K_stories260K-f32-00002-of-00002.gguf /home/angt/.cache/llama.cpp/angt_test-split-model-stories260K_stories260K-f32-00001-of-00002.gguf /home/angt/.cache/llama.cpp/angt_test-split-model-stories260K_stories260K-f32-00001-of-00002.gguf.etag /home/angt/.cache/llama.cpp/angt_test-split-model-stories260K_stories260K-f32-00002-of-00002.gguf.etag /home/angt/.cache/llama.cpp/manifest=angt=test-split-model-stories260K=latest.json $ build/bin/llama-server ================================================================================ WARNING: Migrating cache to HuggingFace cache directory Old cache: /home/angt/.cache/llama.cpp/ New cache: /home/angt/.cache/huggingface/hub This one-time migration moves models previously downloaded with -hf from the legacy llama.cpp cache to the standard HuggingFace cache. Models downloaded with --model-url are not affected. ================================================================================ migrate_file: migrated angt_test-split-model-stories260K_stories260K-f32-00001-of-00002.gguf -> /home/angt/.cache/huggingface/hub/models--angt--test-split-model-stories260K/snapshots/68c3ea2061e8c7688455fab07597dde0f4d7f0db/stories260K-f32-00001-of-00002.gguf migrate_file: migrated angt_test-split-model-stories260K_stories260K-f32-00002-of-00002.gguf -> /home/angt/.cache/huggingface/hub/models--angt--test-split-model-stories260K/snapshots/68c3ea2061e8c7688455fab07597dde0f4d7f0db/stories260K-f32-00002-of-00002.gguf migrate_old_cache_to_hf_cache: migration complete, deleting manifest: /home/angt/.cache/llama.cpp/manifest=angt=test-split-model-stories260K=latest.json $ find /home/angt/.cache/llama.cpp /home/angt/.cache/huggingface /home/angt/.cache/llama.cpp /home/angt/.cache/huggingface /home/angt/.cache/huggingface/hub /home/angt/.cache/huggingface/hub/models--angt--test-split-model-stories260K /home/angt/.cache/huggingface/hub/models--angt--test-split-model-stories260K/blobs /home/angt/.cache/huggingface/hub/models--angt--test-split-model-stories260K/blobs/50d019817c2626eb9e8a41f361ff5bfa538757e6f708a3076cd3356354a75694 /home/angt/.cache/huggingface/hub/models--angt--test-split-model-stories260K/blobs/7b273e1dbfab11dc67dce479deb5923fef27c39cbf56a20b3a928a47b77dab3c /home/angt/.cache/huggingface/hub/models--angt--test-split-model-stories260K/refs /home/angt/.cache/huggingface/hub/models--angt--test-split-model-stories260K/refs/main /home/angt/.cache/huggingface/hub/models--angt--test-split-model-stories260K/snapshots /home/angt/.cache/huggingface/hub/models--angt--test-split-model-stories260K/snapshots/68c3ea2061e8c7688455fab07597dde0f4d7f0db /home/angt/.cache/huggingface/hub/models--angt--test-split-model-stories260K/snapshots/68c3ea2061e8c7688455fab07597dde0f4d7f0db/stories260K-f32-00002-of-00002.gguf /home/angt/.cache/huggingface/hub/models--angt--test-split-model-stories260K/snapshots/68c3ea2061e8c7688455fab07597dde0f4d7f0db/stories260K-f32-00001-of-00002.gguf Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-03-26 12:04:37 +01:00
Michael Wand	112c78159f	ggml-cuda: Add NVFP4 dp4a kernel (#20644 ) Added check for dst_t to cuda_cast template for float Restored ggml_cuda_ue4m3_to_fp32, changed vecdot ints to int32ts Added CUDART/HIP Check and HIP/fp8 include Added NVFP4 to Test-backend-ops Added hip_fp8_e4m3 to __nv_fp8_e4m3 typedef --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2026-03-26 09:54:03 +01:00
SamareshSingh	0fac87b157	imatrix : fix crash when using --show-statistics with zero counts (#19532 ) * imatrix: fix crash when using --show-statistics with zero counts Fixes division by zero that caused floating point exceptions when processing imatrix files with zero count values. Added checks to skip zero counts and handle empty activation vectors. Fix for the bug #19190 * imatrix: lower log level for zero-count skip message to DBG	2026-03-26 08:14:36 +01:00
Yihao Wang	0a524f2404	CUDA & CPU: support F32 kernel type for `CONV_TRANSPOSE_2D` (#17094 ) * Refactor CUDA 2D transpose implementation to support multiple kernel types and improve parameter handling - Introduced a `conv2d_transpose_params` struct for better parameter management. - Updated `conv2d_transpose_kernel` to be templated for different kernel types (float and half). - Modified `ggml_cuda_conv_2d_transpose_p0` to handle both F16 and F32 kernel types. - Enhanced test cases to validate functionality for both kernel types. * Refactor test cases for 2D convolution transpose to support dynamic kernel types - Updated `test_conv_transpose_2d` structure to improve parameter handling by reordering constructor arguments. - Enhanced test case generation to iterate over kernel types, allowing for flexible testing of different configurations. - Removed hardcoded kernel type instances in favor of a loop for better maintainability and scalability. * Refactor ggml_compute_forward_conv_transpose_2d to support both F16 and F32 tensor types. * Refactor conv2d transpose kernel to use a template for kernel type, enhancing flexibility for different data types. Update test cases to include both F16 and F32 tensor types for comprehensive coverage. * Update ggml/src/ggml-cuda/conv2d-transpose.cu Co-authored-by: Aman Gupta <amangupta052@gmail.com> * Update ggml/src/ggml-cpu/ggml-cpu.c Co-authored-by: Aman Gupta <amangupta052@gmail.com> * Refactor conv2d transpose implementation by removing the conv2d_transpose_params struct and dispatching with direct kernel launch. * Enhance cpu conv2d transpose implementation by introducing a templated kernel type for improved flexibility with F16 and F32 data types. --------- Co-authored-by: Aman Gupta <amangupta052@gmail.com>	2026-03-26 10:19:14 +08:00
Adrien Gallouët	c0159f9c1f	common : do not delete old files from the old cache when updating (#21000 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-03-25 22:28:04 +01:00
Saba Fallah	a970515bdb	mtmd: Add DeepSeekOCR Support (#17400 ) * mtmd: llama.cpp DeepSeekOCR support init commit * loading sam tensors * mtmd: fix vision model processing * deepseek-ocr clip-vit model impl * mtmd: add DeepSeek-OCR LM support with standard attention * mtmd: successfully runs DeepSeek-OCR LM in llama-cli * mtmd: Fix RoPE type for DeepSeek-OCR LM. * loading LM testing Vision model loading * sam warmup working * sam erroneous return corrected * clip-vit: corrected cls_embd concat * clip-vit: model convert qkv_proj split * corrected combining of image encoders' results * fix: update callback for ffn_moe_weighted and add callback for attn_out in deepseek2 model * concat image_newline and image_seperator tokens * visual_model warmup (technically) works * window partitioning using standard ggml ops * sam implementation without using CPU only ops * clip: fixed warnings * Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into sf/deepseek-ocr * mtmd: fix get_rel_pos * mtmd: fixed the wrong scaler for get_rel_pos * image encoding technically works but the output can't be checked singe image decoding fails * mtmd: minor changed * mtmd: add native resolution support * - image encoding debugged - issues fixed mainly related wrong config like n_patches etc. - configs need to be corrected in the converter * mtmd: correct token order * - dynamic resizing - changes are concerning PR https://github.com/sfallah/llama.cpp/pull/4 * mtmd: quick fix token order * mtmd: fix danling pointer * mtmd: SAM numerically works * mtmd: debug CLIP-L (vit_pre_ln) * mtmd: debug CLIP-L & first working DeepSeek-OCR model * mtmd : add --dsocr-mode CLI argument for DeepSeek-OCR resolution control & all native resolution modes work * mtmd: simplify SAM patch embedding * mtmd: adapt Pillow image resizing function * mtmd: simplify DeepSeek-OCR dynamic resolution preprocessing * mtmd: remove --dsocr-mode argument * mtmd: refactor code & remove unused helper functions * mtmd: fix tensor names for image newlines and view separator * clean up * reverting automatically removed spaces * reverting automatically removed spaces * mtmd: fixed bad ocr check in Deepseek2 (LM) * mtmd: support combined QKV projection in buid_vit * using common build_attn in sam * corrected code-branch when flash-attn disabled enabling usage of --flash-attn option * mtmd: minor fix * minor formatting and style * fixed flake8 lint issues * minor editorconfig-check fixes * minor editorconfig-check fixes * mtmd: simplify get_rel_pos * mtmd: make sam hparams configurable * mtmd: add detailed comments for resize_bicubic_pillow * mtmd: fixed wrong input setting * mtmd: convert model in FP16 * mtmd: minor fix * mtmd: remove tweak to llama-mtmd-cli & deepseek-ocr template * fix: test-1.jpg ORC issue with small (640) resolution setting min-resolution base (1024) max large (1280) for dynamic-resolution * minor: editconfig-check fix * merge with changes from https://github.com/ggml-org/llama.cpp/pull/17909 added new opt to tests.sh to disable flash-attn * minor: editconfig-check fix * testing deepseek-ocr quick and dirty test script comparing results of Qwen2.5-VL vs DeepSeek-OCR * quick and (potential) dirty merge with https://github.com/ggml-org/llama.cpp/pull/17909 * refactoring, one single builder function and static helpers * added deepseek-ocr test to tests.sh * minor formatting fixes * check with fixed expected resutls * minor formatting * editorconfig-check fix * merge with changes from https://github.com/ggml-org/llama.cpp/pull/18042 * minor - added GLM-4.6V to big tests - added missing deps for python test * convert: minor fix * mtmd: format code * convert: quick fix * convert: quick fix * minor python formatting * fixed merge build issue * merge resolved - fixed issues in convert - tested several deepseek models * minor fix * minor * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * - removed clip_is_deepseekocr - removed redundant RESIZE_ALGO_BICUBIC_PILLOW resize-algo - simplified image-preprocessing - removed/simplified debug functions * - cleaning commented out code * fixing instabilities issues reintroducing resize_bicubic_pillow * - use f16 model for deepseek-ocr test - ignore llama-arch test for deepseek-ocr * rename fc_w --> mm_fc_w * add links to OCR discussion * cleaner loading code * add missing .weight to some tensors * add default jinja template (to be used by server) * move test model to ggml-org * rolling back upscale change * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: bluebread <hotbread70127@gmail.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> Co-authored-by: Xuan Son Nguyen <son@huggingface.co> Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>	2026-03-25 19:57:40 +01:00
Adrien Gallouët	056b50c319	common : fix verbosity setup (#20989 ) The verbosity threshold was set at the end of common_params_parse_ex(), after doing many things (like downloading files..) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-03-25 19:41:01 +01:00
Adrien Gallouët	f2c72b8f1f	common : fix gguf selection in common_list_cached_models (#20996 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-03-25 19:18:06 +01:00
uvos	ec54ac13a8	ci : fix parsing of vgpr counts in hip-quality-check (#20987 ) * scripts: hip: gcn-cdna-vgpr-check: fix parsing of vgpr counts when an amdclang Remark block is interlieved with another from a different process * Return warning ignore * obay pep8 inline double space before inline commets * add # noqa: NP100 for other prints too * Add script changes to cause autotrigger	2026-03-25 19:00:37 +01:00
Saba Fallah	80322ebdaf	model: codefuse-ai/F2LLM-v2 support	2026-03-25 18:33:42 +01:00

1 2 3 4 5 ...

8575 Commits All Branches Search

8575 Commits

All Branches