Pitch Manipulation to mitigate Gender Bias (ASRU 2023)

Code and models for the paper: "No Pitch Left Behind: Addressing Gender Unbalance in Automatic Speech Recognition Through Pitch Manipulation" accepted at ASRU 2023.

Models and Outputs

To ensure complete reproducibility, we release the ASR model checkpoints used in our experiments, together with the SentencePiece model, the vocabulary files, the yaml files, and the outputs obtained by each model:

Baseline: checkpoint | config.yaml | tst-COMMON.out | tst-HE.out
Baseline + VTLP: checkpoint | config.yaml | tst-COMMON.out | tst-HE.out
Baseline + Random: checkpoint | config.yaml | tst-COMMON.out | tst-HE.out
Baseline + Opposite: checkpoint | config.yaml | tst-COMMON.out | tst-HE.out
Baseline + Random - Formant Shifting: checkpoint | config.yaml | tst-COMMON.out | tst-HE.out
Baseline + Random - Formant Shifting - Gender Swapping: checkpoint | config.yaml | tst-COMMON.out | tst-HE.out
Vocabulary: vocab.txt | spm_model

Data Preprocessing

Data (MuST-C v1, en-es direction) have to be preprocessed with:

python /path/to/fbk-fairseq/examples/speech_to_text/preprocess_generic.py --data-root /data/to/mustc \
        --save-dir /data/to/mustc/save_folder --wav-dir /data/to/mustc/wav_folder \
        --split train, dev, tst-HE, tst-COMMON --vocab-type bpe --src-lang en --tgt-lang en \
        --task asr --n-mel-bins 80 --store-waveform

Training

The following parameters are intended for training on a system with 4 GPUs, each having 16 GB of VRAM. The training_data and dev_data files are in TSV format, obtained after preprocessing. The config_file is a YAML file and can be downloaded above.

python train.py /path/to/data_folder \
        --train-subset training_data --valid-subset dev_data \
        --save-dir /path/to/save_folder \
        --num-workers 5 --max-update 50000 --patience 10 --keep-last-epochs 13 \
        --max-tokens 10000 --adam-betas '(0.9, 0.98)' \
        --user-dir examples/speech_to_text \
        --task speech_to_text_ctc --config-yaml config_file \
        --criterion ctc_multi_loss --underlying-criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
        --arch conformer \
        --ctc-encoder-layer 8 --ctc-weight 0.5 \
        --optimizer adam --lr 2e-3 --lr-scheduler inverse_sqrt \
        --warmup-updates 25000 \
        --clip-norm 10.0 \
        --seed 1 --update-freq 8 \
        --skip-invalid-size-inputs-valid-test \
        --log-format simple >> /path/to/save_folder/train.log 2> /path/to/save_folder/train.err


python /path/to/fbk-fairseq/scripts/average_checkpoints.py --input /path/to/save/folder  --num-epoch-checkpoints 5 --checkpoint-upper-bound $(ls /path/to/save_folder | head -n 5 | tail -n 1 | grep -o "[0-9]*") --output /path/to/save_folder/avg5.pt

Inference

Inference can be executed with the following command (setting TEST_DATA to a TSV obtained from the preprocessing and CONFIG_FILE to one of the YAML files provided above):

python /path/to/fbk-fairseq/fairseq_cli/generate.py /path/to/data_folder \
        --gen-subset $TEST_DATA \
        --user-dir examples/speech_to_text \
        --max-tokens 40000 \
        --config-yaml $CONFIG_FILE \
        --beam 5 \
        --max-source-positions 10000 \
        --max-target-positions 1000 \
        --task speech_to_text_ctc \
        --criterion ctc_multi_loss \
        --underlying-criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
        --no-repeat-ngram-size 5 \
        --path /path/to/checkpoint > /path/to/output_file

Evaluation

We use the Python package JiWER to compute the word error rate. Gender-specific evaluations are performed by partitioning the test sets based on the MuST-Speaker resource.

Citation

@inproceedings{fucci2023pitch,
      title={{No Pitch Left Behind: Addressing Gender Unbalance in Automatic Speech Recognition through Pitch Manipulation}}, 
      author={Dennis Fucci and Marco Gaido and Matteo Negri and Mauro Cettolo and Luisa Bentivogli},
      year={2023},
      booktitle="IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)",
      month = dec,
      address="Taipei, Taiwan"
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pitch Manipulation to mitigate Gender Bias (ASRU 2023)

Models and Outputs

Data Preprocessing

Training

Inference

Evaluation

Citation

Uh oh!

FilesExpand file tree

PITCH_MANIPULATION_ASR.md

Latest commit

History

PITCH_MANIPULATION_ASR.md

File metadata and controls

Pitch Manipulation to mitigate Gender Bias (ASRU 2023)

Models and Outputs

Data Preprocessing

Training

Inference

Evaluation

Citation