Julia Longtin
|
b34575b1f3
|
add missing jump.
|
2024-05-11 12:53:23 +00:00 |
Julia Longtin
|
fa0226c8df
|
look at the right final memory location.
|
2024-05-11 11:27:52 +00:00 |
Julia Longtin
|
fba57c125c
|
subtract the correct amount.
|
2024-05-11 11:11:15 +00:00 |
Julia Longtin
|
3156e639bf
|
change from handling three iterations per loop to four.
|
2024-05-11 11:07:16 +00:00 |
Julia Longtin
|
a82ada7dcd
|
comment clarification.
|
2024-05-10 21:57:16 +00:00 |
Julia Longtin
|
4a3c42c82c
|
correct a comment, and use jz when comparing to zero.
|
2024-05-10 20:30:56 +00:00 |
Julia Longtin
|
806472787d
|
use values inside of the loop as soon as we have them.
|
2024-05-10 19:33:58 +00:00 |
Julia Longtin
|
21a1e740c2
|
fix loop.
|
2024-05-10 17:07:27 +00:00 |
Julia Longtin
|
7e44eabe0f
|
move sub earlier, and move the compare of iterations to outside, and at the end of the loop.
|
2024-05-10 17:03:41 +00:00 |
Julia Longtin
|
7966c8e443
|
spacing and comment changes.
|
2024-05-10 16:50:39 +00:00 |
Julia Longtin
|
650094e17b
|
remove useless prefetches.
|
2024-05-10 16:28:53 +00:00 |
Julia Longtin
|
0ff7d5dd1a
|
perform better prefetches, and invert the test of our clear flag for clarity.
|
2024-05-10 16:14:28 +00:00 |
Julia Longtin
|
b00607d1ab
|
use vbroadcastss in place of vbroadcast32x4.
|
2024-05-10 15:52:35 +00:00 |
Julia Longtin
|
f6edcc4061
|
Use a vectorized assembly function to handle remaining chunks less than vector wide.
|
2024-05-10 14:52:46 +00:00 |
Julia Longtin
|
2282ac4d9f
|
broadcast a single int8, instead of 4 of them.
|
2024-05-10 14:19:27 +00:00 |
Julia Longtin
|
81ca166ecd
|
minor spacing and comment changes.
|
2024-05-09 16:57:59 +00:00 |
Julia Longtin
|
53773e0b4a
|
replace tabs with spaces.
|
2024-04-03 23:42:34 +00:00 |
Julia Longtin
|
9152143fe7
|
reformat, and label what these files are.
|
2024-04-03 23:21:24 +00:00 |
Julia Longtin
|
6f67ea886f
|
formatting changes.
|
2024-04-03 20:24:00 +00:00 |
Julia Longtin
|
bb5eb95816
|
use better memory save operator.
|
2024-03-23 20:49:11 +00:00 |
Julia Longtin
|
8f57803f58
|
import stdio.h for size_t.
|
2024-03-23 14:29:59 +00:00 |
Julia Longtin
|
9bcb8350d5
|
import stdint.h for sizeSt.
|
2024-03-23 14:28:29 +00:00 |
Julia Longtin
|
ac3637142d
|
formatting changes.
|
2024-03-20 21:34:12 +00:00 |
Julia Longtin
|
ee27148629
|
remove intrinsics import, and use upConv to save 12 bytes of memory transit.
|
2024-03-20 20:15:30 +00:00 |
Julia Longtin
|
ab6f3a8a8d
|
Update ggml-phi-knc.c
|
2024-03-17 21:36:14 +00:00 |
Julia Longtin
|
fe663c1b63
|
merge from upstream
|
2024-03-17 21:15:32 +00:00 |
Julia Longtin
|
717e164dd7
|
implement F32 dot products.
|
2024-03-16 14:05:03 +00:00 |