textvqa

Here are 5 public repositories matching this topic...

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

deep-learning dialog pytorch vqa pretrained-models captioning multimodal multi-tasking textvqa hateful-memes

Official code for paper "Spatially Aware Multimodal Transformers for TextVQA" published at ECCV, 2020.

language vision eccv textvqa

PyTorch DataLoader for many VQA datasets

pytorch vqa dataloader gqa textvqa vqav2

[PRL 2024] This is the code repo for our label-free pruning and retraining technique for autoregressive Text-VQA Transformers (TAP, TAP†).

transformer textvqa pruning-algorithms

Enable intelligent retrieval, filtering, and summarization of scientific papers from multiple sources for efficient research and report generation.

sales awesome research deep-learning cpp end-to-end dialog pytorch vqa drones agents multimodal textvqa ai-researcher large-language-models

Add a description, image, and links to the textvqa topic page so that developers can more easily learn about it.

To associate your repository with the textvqa topic, visit your repo's landing page and select "manage topics."