Also implement support for some model variations:
- Local attention.
- Add support for biases.
- Use RoPE only on half vectors.
- Support different order of QKV weights.
Co-authored-by: Andrey Mikhaylov <amik@google.com>
Co-authored-by: Martin Bruse <zondolfin@gmail.com>
Co-authored-by: Zoltan Szabadka <szabadka@google.com>
Also add a script to help running sanitizer builds, and do some cleanup.
Co-authored-by: Andrey Mikhaylov <amik@google.com>
Co-authored-by: Eugene Kliuchnikov <eustas@google.com>
Co-authored-by: Sami Boukortt <sboukortt@google.com>
Co-authored-by: Zoltan Szabadka <szabadka@google.com>