llama.cpp

Author	SHA1	Message	Date
Oleksandr Kuvshynov	66982abcb1	fixes	2024-05-24 12:22:59 -04:00
Oleksandr Kuvshynov	02e2c91d01	correct split id	2024-05-24 09:52:28 -04:00
Oleksandr Kuvshynov	60fe62e6eb	some renaming	2024-05-22 23:52:36 -04:00
Oleksandr Kuvshynov	479c80a0db	duo: cleanup v2	2024-05-22 23:31:23 -04:00
Oleksandr Kuvshynov	eecdd3b0ce	duo: first ~working option	2024-05-22 23:02:31 -04:00
Oleksandr Kuvshynov	2849247c4f	duo: more cleanup	2024-05-21 22:45:59 -04:00
Oleksandr Kuvshynov	f3965704fd	duo: simplify a little	2024-05-21 22:31:52 -04:00
Oleksandr Kuvshynov	d52d193e58	duo v0 setting up RPC + callback on each split completion 1. start rpc server on local instance on two different ports with 5GB allocated each. 2. set up another callback on completion of a split. This seems cleaner than trying to second-guess which tensor is the boundary of a split. 3. run it with 8B model @ 4bit, observe split_done captured at a reasonable place. Next step - bring back linear speculation and start speculating on another remote instances.	2024-05-21 16:11:30 -04:00