A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
-
Updated
Jun 17, 2026 - Python
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
[PRL 2024] This is the code repo for our label-free pruning and retraining technique for autoregressive Text-VQA Transformers (TAP, TAP†).
Enable intelligent retrieval, filtering, and summarization of scientific papers from multiple sources for efficient research and report generation.
Add a description, image, and links to the textvqa topic page so that developers can more easily learn about it.
To associate your repository with the textvqa topic, visit your repo's landing page and select "manage topics."