Default Branch

07a0c4ba92 · Revert "ggml-cuda: use CMAKE_CUDA_ARCHITECTURES if set when GGML_NATIVE=ON (#18413)" (#18426) · Updated 2025-12-28 04:53:36 -08:00

Branches

68e4fed4d9 · Now fix test-quantize-fns · Updated 2024-03-21 04:18:03 -07:00    happyz

5081
3

9a424a3872 · server : fix tests expecting old repeat penalty · Updated 2024-03-19 14:12:28 -07:00    happyz

5097
1

0a9bc301ac · control-vectors : minor code style updates · Updated 2024-03-14 07:43:37 -07:00    happyz

5136
3

abf0afd0d6 · ci : fix iOS builds to use embedded library · Updated 2024-03-14 02:34:22 -07:00    happyz

5154
4

9f805264dc · Attempt 2 · Updated 2024-03-12 09:40:13 -07:00    happyz

5154
3

5440a127c7 · iq1_s: fix dequantize on the CPU · Updated 2024-03-11 06:17:28 -07:00    happyz

5167
6

76be02aebc · sycl : fix grid type · Updated 2024-03-11 06:17:08 -07:00    happyz

5162
3

989e15b3c1 · Merge branch 'master' into sycl_q3s_q1s · Updated 2024-03-10 20:11:35 -07:00    happyz

5169
9

b54afce9f4 · mostly style fixes; fix KQ_mask comment · Updated 2024-03-09 11:03:46 -08:00    happyz

5212
10

0ba20ed97a · llama : compute BERT graph with F16 K, V · Updated 2024-03-07 06:33:30 -08:00    happyz

5202
1

b5b0270372 · Revert "[SYCL] fix error when set main gpu to non-zero (#5901)" · Updated 2024-03-07 01:11:18 -08:00    happyz

5206
1

31cecc8734 · iq3_s_mult_shuffle: use lookup table on Metal · Updated 2024-03-05 00:19:44 -08:00    happyz

5280
24

4ec0e9abbf · wip · Updated 2024-03-04 07:07:12 -08:00    happyz

5229
5

eb0bf32caf · server: tests: schedule slow dispatch only on release or on demand · Updated 2024-03-02 14:18:31 -08:00    happyz

5241
1

0b673ca187 · s/_MODEL_CLASSES/_model_classes/ · Updated 2024-03-02 09:14:37 -08:00    happyz

5254
3

d4dfc250cc · Fix ARM_NEON · Updated 2024-03-02 00:12:02 -08:00    happyz

5259
7

f8ab539190 · convert : update help string · Updated 2024-03-01 09:29:34 -08:00    happyz

5257
3

9862d59c05 · llama : change starcoder2 rope type · Updated 2024-03-01 05:10:31 -08:00    happyz

5266
8

f0cbb6ddf6 · iq1_s: turn off SIMD implementation for QK_K = 64 (it does not work) · Updated 2024-02-27 22:28:10 -08:00    happyz

5281
6

14d757066b · llama : add llama_kv_cache_compress (EXPERIMENTAL) · Updated 2024-02-27 06:24:40 -08:00    happyz

5282
1