llama.cpp/examples/duo/README.md

202 B

duo

Minimal example. What's not implemented, but can be implemented separately in pieces:

  • tree-based speculation
  • correct sampling
  • support more than 2 instances
  • just one instance speculates