llama.cpp/examples/duo/README.md

172 B

duo

Minimal example. What's not implemented, but can be implemented separately in pieces:

  • tree-based speculation
  • correct sampling
  • support more than 2 instances