Skip to content

filby89/mochi

Repository files navigation

MOCHI

Registration-Free Learnable Multi-View Capture of Faces
in Dense Semantic Correspondence

Project Page arXiv Video

Panagiotis P. Filntisis  ·  George Retsinas  ·  Radek Danecek  ·  Vanessa Sklyarova  ·  Petros Maragos  ·  Timo Bolkart

CVPR 2026

MOCHI teaser
MOCHI predicts topologically consistent 3D face meshes in dense semantic correspondence directly from calibrated multi-view images, and a test-time optimization (MOCHI-TTO) pass further sharpens the geometry.


1. Installation

The code targets Python 3.10 and CUDA 12.4. Create a fresh environment and run the installer:

python3.10 -m venv .venv
source .venv/bin/activate
bash install.sh

install.sh pulls PyTorch 2.5.1 (cu124), pytorch3d 0.7.8, MPI-IS mesh, kaolin, kornia, pyrender, trimesh, wandb, etc., and builds the vendored liegroups package under modules/liegroups. It also downloads the FLAME 2023 head model (flame2023_no_jaw.pkl), which is license-restricted: register at https://flame.is.tue.mpg.de/ and agree to the license first — the script will prompt for your FLAME username and password.

2. Data

MOCHI uses the FaMoS dataset, released as part of TEMPEH. Follow the following steps to prepare the data.

a) Download FaMoS. Register at https://tempeh.is.tue.mpg.de/ and agree to the license, then use the (TEMPEH-provided) fetch scripts under famos_download/ — see famos_download/README.md for details:

cd famos_download
bash fetch_test_subset.sh     # quick start: small paper test subset
bash fetch_training_data.sh   # full training set (images, scans, FLAME registrations)
bash fetch_test_data.sh       # full test set
cd ..

b) Preprocess into multi-view grids. Follow datasets/preprocess.md, which uses datasets/build_grids.py to render the ground-truth normal/depth maps from the scans and pack them, alongside the multi-view RGB, cameras, and dense landmarks, into the grid layout the trainer reads.

3. Training

Training also needs the dense-landmark predictions for FaMoS. These come from a synthetic-data-trained landmark detector that is not shipped here, so we release the precomputed predictions as two Drive archives:

mkdir -p famos_dense_landmarks && cd famos_dense_landmarks
gdown 19F8IdfmxZw4aXqSvvYlp7Z3R_Ek9vRvQ -O color_dense_landmarks.zip          # ~30 GB
gdown 1UOtmoTGXdFV4dP9TtmRUHEG9XACc2O_8 -O color_dense_semantic_landmarks.zip # ~1.2 GB
unzip color_dense_landmarks.zip
unzip color_dense_semantic_landmarks.zip
cd ..

This yields famos_dense_landmarks/{color_dense_landmarks,color_dense_semantic_landmarks}/<subject>/<sequence>/<frame>/…. Point the trainer at them by editing scripts/_data_paths.sh (--dense-landmarks-dir / --dense-semantic-landmarks-dir).

The model then trains in three sequential stages, with an optional fourth test-time-optimization pass. After editing the data paths in scripts/_data_paths.sh, run the stage launchers in order:

bash scripts/stage1_pretrain.sh   # coarse, no differentiable rendering
bash scripts/stage2_coarse.sh     # coarse + differentiable rendering   (needs stage 1)
bash scripts/stage3_local.sh      # local refinement                    (needs stage 2)
bash scripts/stage4_refine.sh     # optional per-scene TTO              (needs stage 3)

4. Pretrained models

We release the trained MOCHI checkpoints so you can run the test-time optimization (stage 4) — or start local refinement (stage 3) — without retraining from scratch. Download them with gdown into pretrained_models/ (the default paths the stage scripts expect):

mkdir -p pretrained_models
gdown 1YFs_CUUyzwtjwrO-sbBZ3FWOXZdMjLGJ -O pretrained_models/global.pth   # coarse global model (stages 1-2)
gdown 1d0cCVz344RtKcB4X5DfverupZtHKumCS -O pretrained_models/local.pth    # local refinement model (stage 3)

scripts/stage3_local.sh and scripts/stage4_refine.sh default to pretrained_models/global.pth and pretrained_models/local.pth; override them via the PRETRAINED_CKPT / PRETRAINED_LOCAL_CKPT environment variables if you store the checkpoints elsewhere.

Acknowledgements

This work builds directly on TEMPEH (MPI-IS, 2023); much of the multi-view volumetric backbone and data tooling derives from it. We also use FLAME and pytorch3d / kaolin for differentiable rendering.

License

See the LICENSE file. This repository builds on the TEMPEH codebase; please respect the upstream license terms at https://tempeh.is.tue.mpg.de/license.html.

Citation

If you find this work useful, please consider citing:

@inproceedings{filntisis2026mochi,
    title     = {Registration-Free Learnable Multi-View Capture of Faces in Dense Semantic Correspondence},
    author    = {Filntisis, Panagiotis P. and Retsinas, George and Daněček, Radek and Sklyarova, Vanessa and Maragos, Petros and Bolkart, Timo},
    booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
    year      = {2026}
}

About

Official Pytorch Implementation of MOCHI: Registration-Free Learnable Multi-View Capture of Faces in Dense Semantic Correspondence (CVPR 2026)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages