- Add mBART encoder/decoder architecture for text generation
- Implement Swin Transformer for vision encoding
- Add cross-attention support for multimodal fusion
- Create conversion scripts for facebook/nougat-base model
- Add nougat-cli tool for document OCR processing
- Support multiple output formats (markdown, LaTeX, plain text)
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>