Default Branch

58062860af · ggml : use WARP_SIZE/2 for argmax reduction offset (#18092) · Updated 2025-12-16 19:47:01 -08:00

Branches

91e67b8583 · imatrix : fix 3d tensor counts · Updated 2025-07-31 08:56:38 -07:00

1397
4

b98f80a6b4 · server : test alternative LRU logic · Updated 2025-07-29 11:19:21 -07:00

1418
1

0591b39e48 · ops: add MUSA · Updated 2025-07-29 02:25:32 -07:00

1424
1

381879e0ac · cont : tmp · Updated 2025-07-28 21:42:55 -07:00

1448
3

fb371c18ec · bench,common : add CPU extra buffer types · Updated 2025-07-28 11:53:18 -07:00

1425
1

e9f7e7cce2 · ops : update BLAS · Updated 2025-07-27 23:42:57 -07:00

1435
1

a5801f408f · sync : ggml · Updated 2025-07-25 04:31:39 -07:00

1454
2

6f4c57236b · server : fix vision test regex · Updated 2025-07-25 01:22:36 -07:00

1476
1

e65aa69402 · context : only sort outputs when needed · Updated 2025-07-24 08:06:34 -07:00

1463
1

a124399f19 · sched : fix multiple evaluations of the same graph with pipeline parallelism · Updated 2025-07-24 07:03:14 -07:00

1463
1

978c88ba0a · cont : add TODO · Updated 2025-07-24 06:31:10 -07:00

1465
2

1ef3cc1a87 · imatrix : use GGUF regardless of the output filename · Updated 2025-07-23 20:22:41 -07:00

1470
2

55cf48de1e · cuda : fix multi-seq, quantized FA · Updated 2025-07-22 10:48:53 -07:00

1512
2

0a0af0dbbd · Vulkan: Fix fprintf format-security warning · Updated 2025-07-19 02:45:31 -07:00

1506
1

386892ec61 · sync : ggml · Updated 2025-07-19 01:46:12 -07:00

1507
1

cfe5e98423 · graph : fix graph reuse reset of params · Updated 2025-07-18 07:50:32 -07:00

1510
1

9106d7595d · model : fix build after merge conflict · Updated 2025-07-18 01:50:59 -07:00

1513
1

05baa62a73 · kv-cache : fix k-shift for multiple streams · Updated 2025-07-17 10:18:36 -07:00

1522
1

07908a824a · server : pre-calculate EOG logit biases · Updated 2025-07-16 03:47:05 -07:00

1535
1

9f8d285901 · server : fix handling of the ignore_eos flag · Updated 2025-07-15 21:37:18 -07:00

1540
1