Commit Graph

2499 Commits

Author SHA1 Message Date
Julia Longtin fba57c125c subtract the correct amount. 2024-05-11 11:11:15 +00:00
Julia Longtin 3156e639bf change from handling three iterations per loop to four. 2024-05-11 11:07:16 +00:00
Julia Longtin a82ada7dcd comment clarification. 2024-05-10 21:57:16 +00:00
Julia Longtin 4a3c42c82c correct a comment, and use jz when comparing to zero. 2024-05-10 20:30:56 +00:00
Julia Longtin 806472787d use values inside of the loop as soon as we have them. 2024-05-10 19:33:58 +00:00
Julia Longtin 21a1e740c2 fix loop. 2024-05-10 17:07:27 +00:00
Julia Longtin 7e44eabe0f move sub earlier, and move the compare of iterations to outside, and at the end of the loop. 2024-05-10 17:03:41 +00:00
Julia Longtin 7966c8e443 spacing and comment changes. 2024-05-10 16:50:39 +00:00
Julia Longtin 650094e17b remove useless prefetches. 2024-05-10 16:28:53 +00:00
Julia Longtin 0ff7d5dd1a perform better prefetches, and invert the test of our clear flag for clarity. 2024-05-10 16:14:28 +00:00
Julia Longtin b00607d1ab use vbroadcastss in place of vbroadcast32x4. 2024-05-10 15:52:35 +00:00
Julia Longtin f6edcc4061 Use a vectorized assembly function to handle remaining chunks less than vector wide. 2024-05-10 14:52:46 +00:00
Julia Longtin 2282ac4d9f broadcast a single int8, instead of 4 of them. 2024-05-10 14:19:27 +00:00
Julia Longtin 867de5edce use different restrict syntax, to make g++ happy. 2024-05-09 23:08:43 +00:00
Julia Longtin e1fdfaae45 fix typo 2024-05-09 20:41:50 +00:00
Julia Longtin a283551db0 remove a warning. 2024-05-09 20:40:50 +00:00
Julia Longtin af4ee51fa7 add batch fp16<->fp32 conversion functions. 2024-05-09 19:31:28 +00:00
Julia Longtin 81ca166ecd minor spacing and comment changes. 2024-05-09 16:57:59 +00:00
Julia Longtin 047291fb42 spacing and capitalization changes. Fix the register list of GGML_5bit_Unpacked_Unaligned. 2024-04-26 14:44:08 +00:00
Julia Longtin 77d4ca906b spacing and capitalization changes. 2024-04-25 21:23:22 +00:00
Julia Longtin d69cf87fce use or, instead of and. bug fix? 2024-04-24 17:50:12 +00:00
Julia Longtin 8cae9a9ef6 comment and spacing fixes. 2024-04-24 17:38:42 +00:00
Julia Longtin 90e99eaf1c fix an offset error, and get rid of tabs. 2024-04-22 18:29:31 +00:00
Julia Longtin 6d16090246 fix some small errors. 2024-04-22 18:22:22 +00:00
Julia Longtin e298d9e65e further optimizations. 0.99 tokens per second. 2024-04-22 18:16:28 +00:00
Julia Longtin 53773e0b4a replace tabs with spaces. 2024-04-03 23:42:34 +00:00
Julia Longtin 9152143fe7 reformat, and label what these files are. 2024-04-03 23:21:24 +00:00
Julia Longtin 9ad5efafb0 use GGML_F32_EPR, and remove some dead code. 2024-04-03 22:04:45 +00:00
Julia Longtin 84df774d6a whoops. missing tab. 2024-04-03 21:58:29 +00:00
Julia Longtin 9412572205 add Makefile rule for generation .s file, for manual inspection. 2024-04-03 20:30:25 +00:00
Julia Longtin 6f67ea886f formatting changes. 2024-04-03 20:24:00 +00:00
Julia Longtin 96fdd214c8 indent headers consistently. 2024-04-03 19:01:18 +00:00
Julia Longtin cb4422625a
Merge pull request #1 from julialongtin/k1om
K1om initial support. Round 1.
2024-04-02 17:07:46 +00:00
Julia Longtin 47190a7fe2 formatting. 2024-04-02 17:01:53 +00:00
Julia Longtin 8c17353717 minor changes. 2024-04-02 16:55:40 +00:00
Julia Longtin 9f569ca50b massively rewrite assembly routines. 2024-04-02 15:41:56 +00:00
Julia Longtin 12c9576aec fix vector sizes. 2024-03-25 19:43:37 +00:00
Julia Longtin bc3d6db862 separate filling aux16 from consuming aux16 by making it an array of vectors. 2024-03-24 14:18:08 +00:00
Julia Longtin ca0dc26704 loosen alignment requirements for zeros, add missing function, and promote aux8 to an array of vectors. 2024-03-24 13:35:05 +00:00
Julia Longtin cf481cf901 promote aux8 into a vector. 2024-03-24 12:50:01 +00:00
Julia Longtin 169a145409 fix our reference to src in the second place, and use a more accurate comment. 2024-03-24 12:41:21 +00:00
Julia Longtin c28bfe4552 spacing changes, eliminate dead references to k1 or zero, and use the right type when referring to src. 2024-03-24 12:37:47 +00:00
Julia Longtin ba4f4129b3 better comments, and fix some small errors. 2024-03-24 12:17:06 +00:00
Julia Longtin 03a3e0eb7a perform 16 operations at a time. 2024-03-24 12:04:44 +00:00
Julia Longtin 5935bb34f4 use proper mov operator, and pass addresses. 2024-03-23 23:46:36 +00:00
Julia Longtin a5132a1507 attempt our first FMA. 2024-03-23 22:16:57 +00:00
Julia Longtin 4477b8e123 add I32 vector memory clearing. 2024-03-23 21:16:23 +00:00
Julia Longtin ea1edb0600 promote aux32 to a vector. 2024-03-23 21:12:35 +00:00
Julia Longtin f967690a41 add missing address of operators. 2024-03-23 21:05:50 +00:00
Julia Longtin 2fdd11fe3a promote aux16 to a vector. 2024-03-23 21:00:51 +00:00