Default Branch

58062860af · ggml : use WARP_SIZE/2 for argmax reduction offset (#18092) · Updated 2025-12-16 19:47:01 -08:00

Branches

f68669d50f · fix and opt kernel launch · Updated 2025-07-15 04:28:26 -07:00

1575
3

942c55cd57 · imatrix : avoid using imatrix.dat in README · Updated 2025-07-12 13:50:10 -07:00

1560
32

1180752835 · cuda : support Falcon-H1 state size for SSM_SCAN · Updated 2025-07-09 09:18:37 -07:00

1590
1

4d6a179c68 · gguf-py : avoid adding duplicate tensor mappings for Jamba · Updated 2025-07-09 08:58:35 -07:00

1590
61

b7c6ece5b5 · ggml-ci · Updated 2025-07-09 05:13:34 -07:00

1595
24

7634d14d7a · test-model-random : fix seq_id buffer overflow · Updated 2025-07-08 15:23:58 -07:00

1595
18

2ff3354c33 · memory : fix broken batch splits for recurrent cache · Updated 2025-07-07 18:23:14 -07:00

1606
1

996195299e · up. · Updated 2025-07-07 14:42:40 -07:00

1758
6

bf8b39015f · metal : reuse graphs · Updated 2025-07-07 11:37:07 -07:00

1614
3

886da0a2c5 · kv-cache : prepare K/V buffers for separation · Updated 2025-07-04 00:13:16 -07:00

1625
1

dfceb012ee · llama : add "virtual sequences" · Updated 2025-07-02 10:26:55 -07:00

1636
8

71bef66591 · cuda : graceful fallback for Mamba-1 models with weird embd size · Updated 2025-07-02 00:49:36 -07:00

1645
44

6179578988 · batch : require non-coupled batch with sequential split_equal · Updated 2025-06-25 07:20:46 -07:00

1702
29

37bdfbef8c · wip 3 · Updated 2025-06-24 01:04:18 -07:00

1702
21

ae96333923 · metal : fix thread-safety · Updated 2025-06-20 06:42:54 -07:00

1728
1

6fb2f2e8a9 · ggml : fix repack work size for mul_mat_id · Updated 2025-06-20 00:34:16 -07:00

1731
1

59fee24c72 · recurrent : rework graph inputs + add TODOs · Updated 2025-06-17 23:29:51 -07:00

1755
31

d3d06debe3 · server : add pidfile option · Updated 2025-06-17 13:47:53 -07:00

1756
1

4b2233befb · Vulkan: Set device max size for host memory to avoid OOM warning and fallback to CPU buffer · Updated 2025-06-17 13:25:42 -07:00

1758
1

36fce98281 · server : re-enable swa speculative decoding · Updated 2025-06-12 01:51:15 -07:00

1799
1