Skip to content

Latest commit

 

History

History
118 lines (98 loc) · 4.64 KB

File metadata and controls

118 lines (98 loc) · 4.64 KB

FaMoS preprocessing

The training pipeline operates on distorted, sub-sampled multi-view image grids rather than the raw FaMoS captures. This document describes how to generate the directory layout the training code expects.

1. Obtain the FaMoS dataset

Register at the FaMoS / TEMPEH project page, agree to the license, and download the data with the fetch scripts in ../famos_download/ (see famos_download/README.md). After unpacking you should have:

<famos_root>/
├── downsampled_images_4_no_grid/
│   ├── downsampled_images_4/        # 4×-downsampled per-view RGB
│   └── calibrations/                # per-view camera calibrations
├── meshes_npz/                      # ground-truth scan meshes (.npz)
└── registrations/                   # FLAME registrations

You also need the dense-landmark predictions used as supervision during training. These come from our companion dense-landmark-detector model:

mkdir -p famos_dense_landmarks && cd famos_dense_landmarks
gdown 19F8IdfmxZw4aXqSvvYlp7Z3R_Ek9vRvQ -O color_dense_landmarks.zip          # ~30 GB
gdown 1UOtmoTGXdFV4dP9TtmRUHEG9XACc2O_8 -O color_dense_semantic_landmarks.zip # ~1.2 GB
unzip color_dense_landmarks.zip
unzip color_dense_semantic_landmarks.zip
cd ..

This produces famos_dense_landmarks/{color_dense_landmarks,color_dense_semantic_landmarks}/<subject>/<sequence>/<frame>/…. Pass these paths to --dense-landmarks-dir and --dense-semantic-landmarks-dir in §3 below.

2. Generate the multi-view grids

build_grids.py (in this datasets/ folder) loads each frame, renders the ground-truth normal and depth maps from the scan through each view's intrinsics + radial distortion, and packs everything into the multi-view grid layout the trainer reads.

Run it from the repo root:

python -m datasets.build_grids \
    --data-list assets/meshes_list.json \
    --image-dir            <famos_root>/downsampled_images_4_no_grid/downsampled_images_4 \
    --calibration-dir      <famos_root>/downsampled_images_4_no_grid/calibrations \
    --scan-dir             <famos_root>/meshes_npz \
    --registration-dir     <famos_root>/registrations \
    --dense-landmarks-dir  famos_dense_landmarks/color_dense_landmarks \
    --dense-semantic-landmarks-dir famos_dense_landmarks/color_dense_semantic_landmarks \
    --out-root             <OUTPUT_ROOT>

Use --start <i> and --end <j> to process a slice of the data list (handy for sharding across nodes); see python -m datasets.build_grids --help for every flag.

The script writes:

<OUTPUT_ROOT>/
├── color_images/            # multi-view RGB grids
├── color_normals/           # rendered normal-map preview grids (.png)
├── color_normals_numpy/     # rendered normal-map training grids (.npy)
├── color_depth/             # rendered depth-map grids (.npy)
├── color_cameras/           # per-view intrinsics + extrinsics + centers + radial distortions
├── color_dense_landmarks/   # dense landmark predictions reprojected
└── color_dense_semantic_landmarks/ # dense semantic / mediapipe landmark predictions

3. Wire the trainer to your output

In each stage script under scripts/, set the data-related CLI flags to your output paths:

-tdl  /path/to/your_train_data_list.json
-vdl  /path/to/your_val_data_list.json
--scan-directory      <famos_root>/meshes_npz
--processed-directory <famos_root>/registrations
--image-directory       <OUTPUT_ROOT>/color_images
--normals-image-directory  <OUTPUT_ROOT>/color_normals_numpy
--depths-image-directory   <OUTPUT_ROOT>/color_depth
--calibration-directory    <OUTPUT_ROOT>/color_cameras
--dense-landmarks-dir      <OUTPUT_ROOT>/color_dense_landmarks
--dense-semantic-landmarks-dir <OUTPUT_ROOT>/color_dense_semantic_landmarks

4. Sanity check

Before launching a long training run, verify that the dataset can iterate one batch:

python -c "
from datasets.face_align_dataset_mpi_grid import FaceAlignDatasetMPI
ds = FaceAlignDatasetMPI(
    data_list_fname='<your_train_data_list>.json',
    image_dir='<OUTPUT_ROOT>/color_images',
    calibration_dir='<OUTPUT_ROOT>/color_cameras',
    scan_dir='<famos_root>/meshes_npz',
    registration_root_dir='<famos_root>/registrations',
    normals_dir='<OUTPUT_ROOT>/color_normals_numpy',
    depths_dir='<OUTPUT_ROOT>/color_depth',
    dense_landmarks_dir='<OUTPUT_ROOT>/color_dense_landmarks',
    dense_semantic_landmarks_dir='<OUTPUT_ROOT>/color_dense_semantic_landmarks',
    image_resize_factor=2,
    image_file_ext='png',
)
print('dataset size:', len(ds))
print('sample keys:', list(ds[0].keys())[:10])
"