Commit Graph

  • ba9856be51
    Merge 59e4d45b19 into 15fa3c493b Wingless-Archangel 2026-04-24 13:11:08 +0200
  • 15fa3c493b
    metal : print GPU description (#22318) b8920 Georgi Gerganov 2026-04-24 13:56:03 +0300
  • c9db5fd595
    Merge branch 'master' into pr/20962 Georgi Gerganov 2026-04-24 13:43:31 +0300
  • 3229eb289a common/gemma4 : fix parsing of prefilled reasoning blocks Quair 2026-04-24 12:41:02 +0200
  • dc80c5252a
    common : fix jinja warnings with clang 21 (#22313) b8919 Adrien Gallouët 2026-04-24 12:36:02 +0200
  • 65ed8d4b5a
    Merge 71100d1f9a into e583f3b4f5 peizhang56 2026-04-24 12:25:34 +0200
  • 89f75531a6 fix missed -sm parameter arthw 2026-04-24 18:06:14 +0800
  • 632eaeaac0 address review: remove __ldg compat code, update ops docs for CUDA POOL_1D LeoYangXY 2026-04-24 16:48:06 +0800
  • ebdef3dabd CUDA: add POOL_1D LeoYangXY 2026-04-24 01:28:09 +0800
  • 467e285875
    metal : print GPU description Georgi Gerganov 2026-04-24 11:41:52 +0300
  • 0876586d7c Change so all logs are output before exit Nakasaka, Masato 2026-04-24 17:37:39 +0900
  • 5dd310df7e
    Merge branch 'master' into pr/20962 Georgi Gerganov 2026-04-24 11:24:16 +0300
  • 3d11a9b17f Stopped using static vector Nakasaka, Masato 2026-04-24 17:23:08 +0900
  • 75753d4c61
    cont : shorten line Georgi Gerganov 2026-04-24 11:16:10 +0300
  • 03efbbfc55
    cont : fix requirements heading in PR template Georgi Gerganov 2026-04-24 11:08:30 +0300
  • 9d15a5797f
    gitignore : add .pi + personal SYSTEM.md Georgi Gerganov 2026-04-24 10:51:09 +0300
  • e583f3b4f5
    ggml : minor coding style (#22308) b8918 Georgi Gerganov 2026-04-24 11:02:00 +0300
  • 017f090442
    jinja : remove unused header (#22310) b8917 Georgi Gerganov 2026-04-24 11:01:46 +0300
  • ffdd983fb8
    server : fix swa-full logic (#22288) b8916 Georgi Gerganov 2026-04-24 10:17:37 +0300
  • cbdb8c4c2b
    cmake: skip -Wmissing-noreturn for AppleClang 21.0.0 Georgi Gerganov 2026-04-24 10:08:33 +0300
  • de97a09387
    Merge 90fe2f9949 into 793d0a7931 vampyrebat 2026-04-24 06:48:51 +0000
  • 01f5ebd632 CANN: add new ops, optimize existing ops hipudding 2026-04-16 06:13:00 +0000
  • 02ac442415
    Merge 50360a9d90 into 793d0a7931 Kai Aoki 2026-04-24 15:42:05 +0900
  • 793d0a7931
    server: rename debug tags to match --cache-idle-slots naming (#22292) Yes You Can Have Your Own 2026-04-24 09:28:44 +0300
  • a0a105448b
    jinja : remove unused header Georgi Gerganov 2026-04-24 09:25:17 +0300
  • b3e1d207be
    Merge a79d977fbf into 8bc492ebb4 Rainlin 2026-04-24 14:17:28 +0800
  • 7242d207e3
    ggml : minor coding style Georgi Gerganov 2026-04-24 09:13:40 +0300
  • 5d89de5c3b opencl: simplify adreno q4_0 set_tensor Li He 2026-04-23 22:29:20 -0700
  • aa15fb80d3 opencl: use consistent names for adreno q4_0 gemm/gemv Li He 2026-04-23 22:25:30 -0700
  • 9e916cc393 opencl: use consistent name for adreno q8_0 gemm/gemv Li He 2026-04-23 22:13:24 -0700
  • 7980c68bb0 opencl: refactor q4_0 gemm/gemv loading, use consistent names Li He 2026-04-23 22:01:05 -0700
  • 33de14487d opencl: refactor adreno q4_0 gemm/gemv dispatch Li He 2026-04-23 21:41:46 -0700
  • 3a69daa044
    Merge e516cd0056 into 8bc492ebb4 Uttam 2026-04-24 07:00:44 +0200
  • 092b0f6fa7
    Merge 0e3a8a0c48 into 8bc492ebb4 En Yao 2026-04-24 07:00:44 +0200
  • a0a3f856fe merge binding when kv overlap Zheyuan Chen 2026-04-23 01:05:32 -0700
  • 4c0e94a869 formatting Zheyuan Chen 2026-04-22 18:09:49 -0700
  • ed8bb6cfb2 turn off skip_validation and address buffer overlapping when nwg==1 Zheyuan Chen 2026-04-22 18:08:15 -0700
  • 35add3da33 move path selection into the shader library and have the host consume a single flash-attn decision object. Zheyuan Chen 2026-04-22 17:28:45 -0700
  • e60c49bd28 make different bindings with same underlying buffer to have the same usage flags Zheyuan Chen 2026-04-22 01:41:11 -0700
  • 1fed0ee896 make row_max and exp_sum to local register Zheyuan Chen 2026-04-21 20:53:22 -0700
  • 28cf9535b5 remove Q_TILE as it is always 1 for vec path Zheyuan Chen 2026-04-21 20:33:34 -0700
  • 2a43134fb0 turn on subgroup uniformity check Zheyuan Chen 2026-04-21 18:22:19 -0700
  • a144c47d5f formatting Zheyuan Chen 2026-04-20 22:41:08 -0700
  • 6d2109983e ggml-webgpu: stagging KV for flash attention tile version Zheyuan Chen 2026-04-20 22:12:40 -0700
  • e0f8d89031 ggml-webgpu: enable flash attention vec and tile version for broswer Zheyuan Chen 2026-04-20 21:26:52 -0700
  • 39d7f280f7 ggml-webgpu: modify the vec path to discard the mnk parameter Zheyuan Chen 2026-04-20 20:57:56 -0700
  • 54974e3f3d ggml-webgpu: add new fields and discard usage of mnk for tile version Zheyuan Chen 2026-04-20 20:19:41 -0700
  • 47e4de3169 ggml-webgpu: add tile flash attention fallback Zheyuan Chen 2026-04-20 17:52:15 -0700
  • 7449c1d30a various small cleanups Scott Cutler 2026-04-23 21:24:11 -0700
  • 39db936875
    Merge cef52b529d into 8bc492ebb4 Anmol Jaiswal 2026-04-24 09:41:31 +0530
  • f3d321f2b0 Merge branch 'master' into dev/internal-allreduce Scott Cutler 2026-04-23 21:09:19 -0700
  • fbcae511bd rework a few checks/fallbacks Scott Cutler 2026-04-23 21:07:13 -0700
  • e7791fe57a
    Merge be961a18fc into 8bc492ebb4 FelixCLC 2026-04-23 23:05:56 -0500
  • 4fe7158b18 convert from dos to unix for format issue arthw 2026-04-24 11:27:07 +0800
  • 68bc7c863d
    Merge 49162df87a into 8bc492ebb4 Daniele Pinna 2026-04-24 11:23:01 +0800
  • 4830c849cc fix format issue arthw 2026-04-24 11:22:39 +0800
  • f046004c81 add chunked mode to the kernel for unlimited vector size Scott Cutler 2026-04-23 20:19:11 -0700
  • fce3e1af88
    Merge 874fd8953a into 8bc492ebb4 Anirudh Sathiya Narayanan 2026-04-23 19:53:09 -0700
  • 8a2c13bbdf
    Merge 2aa840df34 into 8bc492ebb4 Kusha Gharahi 2026-04-23 19:53:09 -0700
  • 333e897d7a
    Merge a46418fa27 into 8bc492ebb4 Jeremiah Blanchard 2026-04-23 19:53:09 -0700
  • ef77097c31
    Merge 76eecd3ad7 into 8bc492ebb4 JoongHyuk Shin 2026-04-23 19:53:09 -0700
  • e18eb87380
    Merge cd50aac85d into 8bc492ebb4 David Huggins-Daines 2026-04-23 19:46:47 -0700
  • eba652c651
    Merge e0da25a612 into 8bc492ebb4 kiwixz 2026-04-23 19:46:47 -0700
  • c68ba8e97e
    Merge 5a7baa85dd into 8bc492ebb4 Laitaps 2026-04-23 19:46:47 -0700
  • b19474cdb7 add MoE lazy expert loading with madvise-based memory management BRULE Herman 2026-04-23 22:30:13 -0400
  • 38f1bf4fa0
    Merge 17f341bc33 into 8bc492ebb4 Jan Ekström 2026-04-24 05:20:02 +0300
  • 4d7736c761 fix case where a given tensor has not been computed Scott Cutler 2026-04-23 19:15:08 -0700
  • c69c5fc9e1
    Merge d611fb43e8 into 8bc492ebb4 swetha097 2026-04-24 09:57:22 +0800
  • 8bc492ebb4
    hexagon: add SOLVE_TRI op (#21974) b8914 Mengsheng Wu 2026-04-24 09:39:13 +0800
  • df4232e535 server: apply [*] global preset from --models-preset to router params 許元豪 2026-04-24 09:01:02 +0800
  • 6a7e22be9d cli, server: apply --prio process priority setting 許元豪 2026-03-11 09:28:43 +0800
  • cad53b59c2 Fix the path for upload so CI doesn't fail Shreya Jain 2026-04-23 18:01:21 -0700
  • 2573b7b379 rework reduction provider init to not call ncclCommInitAll if using the internal provider Scott Cutler 2026-04-23 17:37:24 -0700
  • 191730a1b3 Added sanitize_field lambda in build_multipart_body for key, filename and content_type as suggested by ngxson in https://github.com/ggml-org/llama.cpp/pull/22118#discussion_r3127104647 Ralph Paßgang 2026-04-24 02:36:22 +0200
  • 7a44d60071
    Merge 08a1f98eef into e5f070a1dc ingyukoh 2026-04-24 05:55:49 +0530
  • 1888d7f373 hmx flash-attn: fix p_tiles dual-tile OOB race; enable MT + pipeline Yiwei Shao 2026-04-21 21:37:04 -0700
  • 3bda059162 hmx flash-attn: fix prefill correctness (dst indexing, softmax reduce, V stride) Yiwei Shao 2026-04-20 17:41:36 -0700
  • c9cb19663f hmx flash-attn: replace asm clobber with targeted volatile reads on vtcm_d_tiles Yiwei Shao 2026-04-23 16:01:29 -0700
  • ae78a998c8 hmx flash-attn: refine cost model coefficients based on profiling data Yiwei Shao 2026-04-23 16:01:05 -0700
  • a5a4d3c370 [experimental]: fp16 softmax (EXP2_HF) to accelerate fa Yiwei Shao 2026-04-22 19:07:06 -0700
  • 3d1b4ea2f0 hmx: Add an asm memory clobber at the phase boundary to prevent reorder bug Yiwei Shao 2026-04-22 18:40:38 -0700
  • 4f42c8a939 hmx: optimize FA softmax mask phase (no-ALiBi fast path + GQA dedup) Yiwei Shao 2026-04-15 20:22:07 -0700
  • 5ce4ad9db0 hmx: relax matmul pipeline gate to cover k > n shapes (e.g. FFN_down) Yiwei Shao 2026-04-15 19:46:48 -0700
  • a9cd7e35aa hmx: multi-thread Q load / O store and enable prefill FA dispatch Yiwei Shao 2026-04-15 19:38:47 -0700
  • c82b0069ee hmx: unify interleave helper Yiwei Shao 2026-04-14 20:19:13 -0700
  • 8ae3318b3f hmx: apply upstream optimization to hmx-flash-attn-ops.c apply restrict, __builtin_assume, and pointer accumulation to the three HMX workers (qk_dot, o_update, o_norm) and the matching inline HMX loops in op_hmx_flash_attn_ext. Yiwei Shao 2026-04-14 20:18:31 -0700
  • 0b9b50e3f5 hmx: drop the duplicate interleave_fp16_weight_chunk_to_tiles Yiwei Shao 2026-04-14 20:16:56 -0700
  • 3c77050e65 hmx: replace asm wrappers with Q6_ intrinsics in hmx-utils.h Yiwei Shao 2026-04-14 17:40:17 -0700
  • 35b2f3fd22 hmx: add HMX-accelerated flash attention for prefill Yiwei Shao 2026-04-11 19:26:25 -0700
  • 29853cb4c2 hmx: extract shared interleave headers and unify matmul batched Yiwei Shao 2026-04-16 14:39:59 -0700
  • a4ec494d63 hexagon: move HVX f32 add/sub/mul wrappers to hvx-base.h Todor Boinovski 2026-04-23 16:48:08 -0700
  • 976b9d9d9f hexagon: vectorize partial f32 loads Todor Boinovski 2026-04-22 11:51:32 -0700
  • 82f2809742 hexagon: chunk vs batch processingfor better thread utilization Todor Boinovski 2026-04-21 23:05:34 -0700
  • fdfa53b112 hexagon: rm unused variable/function warnings Mengsheng Wu 2026-04-15 17:35:54 +0800
  • 67e017eb02 ggml: fix TODO description for solve_tri Mengsheng Wu 2026-04-08 20:34:19 +0800
  • 96b32ff2e1 hexagon: add SOLVE_TRI op Mengsheng Wu 2026-04-08 20:31:07 +0800
  • 3b91c06543
    Merge c8163b69b8 into e5f070a1dc lekot 2026-04-24 02:44:16 +0300
  • 49c762fe5b docker: propagate OCI labels as manifest and index annotations Samaresh Kumar Singh 2026-04-22 15:16:32 -0500
  • ac114bc0c5 docker: add OCI image labels to all published images Samaresh Kumar Singh 2026-04-08 21:21:22 -0500
  • e5f070a1dc
    fix(shader): handle the buffer aliasing for rms fuse (#22266) b8913 Chen Yuan 2026-04-23 19:32:59 -0400