Default Branch

382808c14b · ci : re-enable rocm build on amd64 (#18439) · Updated 2025-12-28 15:29:23 -08:00

Branches

d857e5192e · quantize : check imatrix for nan/inf values · Updated 2024-06-06 13:44:24 -07:00

4464
2

731e7528be · server : fix --threads-http arg · Updated 2024-06-06 06:37:12 -07:00

4465
1

f7d4b7c343 · build only main and server in their docker images · Updated 2024-06-05 15:13:01 -07:00

4472
2

3d2e79da7f · add openmp lib to dockerfiles · Updated 2024-06-05 15:05:25 -07:00

4472
1

0085f94936 · server : add /v1/completion endpoint · Updated 2024-06-04 05:58:14 -07:00

4482
1

5f8720fb7b · add rpc-server to Makefile · Updated 2024-05-31 08:22:05 -07:00

4517
3

956af1552a · server : update js · Updated 2024-05-31 05:47:19 -07:00

4508
1

77c16ee0d4 · tests : disable json test due to lack of python on the CI node · Updated 2024-05-31 04:16:54 -07:00

4521
3

d32a8f6142 · backup · Updated 2024-05-31 01:51:56 -07:00

4518
2

8a8f8b953f · llama : print a log of the total cache size · Updated 2024-05-29 11:45:43 -07:00

4527
4

1ca802a3e0 · parallelize fattn compilation test · Updated 2024-05-27 16:19:36 -07:00

4553
6

ddc59e8e0a · wipwipwiwpip · Updated 2024-05-27 02:04:09 -07:00

4575
17

4b1770109c · Fix q_xxs using mul_mat_q · Updated 2024-05-27 01:46:37 -07:00

4558
1

1c6cde92bb · metal : disable FA kernel for HS=256 · Updated 2024-05-26 23:57:20 -07:00

4560
1

11f78c6a2d · convert-hf : adapt ArcticModel to use yield too · Updated 2024-05-25 09:52:53 -07:00

4567
4

dd14d818e0 · Update main-intel.Dockerfile base image to 2024.1.0 · Updated 2024-05-23 19:47:58 -07:00

4578
1

c5fe1d6cdc · gguf-py : remove unused import · Updated 2024-05-22 21:09:49 -07:00

4593
2

518b75260b · cuda uma test · Updated 2024-05-22 18:13:48 -07:00

4593
1

e9095e6098 · async direct io per tensor test · Updated 2024-05-21 16:08:52 -07:00

4612
3

a041ced0fd · wip · Updated 2024-05-20 08:20:49 -07:00

4618
1