Default Branch

382808c14b · ci : re-enable rocm build on amd64 (#18439) · Updated 2025-12-28 15:29:23 -08:00

Branches

6494509801 · backup · Updated 2024-08-26 01:58:54 -07:00

3948
2

ccb45186d0 · docs : remove references · Updated 2024-08-25 23:52:17 -07:00

3942
2

8062650343 · llama : fix simple splits when the batch contains embeddings · Updated 2024-08-21 12:09:03 -07:00

3953
19

9127800d83 · wip · Updated 2024-08-16 16:51:06 -07:00

3986
2

62d7b6c87f · cuda : re-add q4_0 · Updated 2024-08-14 03:37:03 -07:00

3982
3

93ec58b932 · server : fix typo in comment · Updated 2024-08-13 19:12:26 -07:00

3984
4

faaac59d16 · llama : support NUL bytes in tokens · Updated 2024-08-11 18:00:03 -07:00

3995
1

73bc9350cd · gguf-py : Numpy dequantization for grid-based i-quants · Updated 2024-08-09 20:47:31 -07:00

4015
2

9329953a61 · llama : avoid double tensor copy when saving session to buffer · Updated 2024-08-07 13:03:34 -07:00

4023
2

7764ab911d · update guide · Updated 2024-08-07 07:01:02 -07:00

4024
1

cad8abb49b · add tool to allow plotting tensor allocation maps within buffers · Updated 2024-08-06 13:09:51 -07:00

4032
1

6e299132e7 · clip : style changes · Updated 2024-08-06 01:44:29 -07:00

4356
56

16dab13bde · correct cmd name · Updated 2024-08-05 09:15:33 -07:00

4041
1

bddcc5f985 · llama : better replace_all · Updated 2024-08-04 03:42:08 -07:00

4057
1

229c35cb59 · gguf-py : remove LlamaFileTypeMap · Updated 2024-08-03 18:22:37 -07:00

4060
5

eab4a88210 · Using dp4a ptx intrinsics for an improved Mul8MAT perf [By Alcpz] · Updated 2024-07-29 08:52:29 -07:00

4078
1

9cddd9aeec · llama : cast seq_id in comparison with unsigned n_seq_max · Updated 2024-07-27 12:50:23 -07:00

4116
7

9aeb0e1f75 · sycl add conv support · Updated 2024-07-25 05:15:02 -07:00

4105
1

5934580905 · ggml : add and use ggml_cpu_has_llamafile() · Updated 2024-07-24 01:31:41 -07:00

4116
1

fe28a7b9d8 · llama : clean-up · Updated 2024-07-22 22:38:50 -07:00

4124
11