SEAR: Simple and Efficient Adaptation of Visual Geometric Transformers for RGB+Thermal 3D Reconstruction
This project aims to estimate camera poses of RGB and Thermal images together.
Clone this repo and VGGT
git clone https://github.com/Schindler-EPFL-Lab/SEAR.git
cd SEAR
git clone https://github.com/facebookresearch/vggt.gitInstall with uv:
uv sync --all-extrasInstall VGGT checkpoint VGGT-1B.
To train our model run this script:
python sear/scripts/train_sear.py --thermal-vggt.vggt-path /path/to/vggt/weights.pthAblation studies can run by using the other aggregator-types found in sear/ablation_models/possible_aggregators.py.
Models can be evaluated after training with sear/scripts/eval/ablation_vggt.py.
To run the evaluation see the tutorials for camera pose and point cloud, relative camera pose from two views and dependence on thermal ratio.
Our training dataset is a combination of the following dataset:
We provide a compilation of all training dataset as well as ours.
See details of the data processing in Dataset documentation.
@misc{skorokhodov2026searsimpleefficientadaptation,
title={SEAR: Simple and Efficient Adaptation of Visual Geometric Transformers for RGB+Thermal 3D Reconstruction},
author={Vsevolod Skorokhodov and Chenghao Xu and Shuo Sun and Olga Fink and Malcolm Mielle},
year={2026},
eprint={2603.18774},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2603.18774},
}


