Skip to content

youssefshafik3/audio-matching-fft

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Status Tools Method

This repository contains a simplified audio fingerprinting system that locates a short audio clip within a longer recording using FFT-based spectral analysis and cosine similarity.

🎓 Project Context

This project was developed by Computer Engineering undergraduate students for the COMM 401: Signals and Systems Theory course at the German University in Cairo (GUC).

The project demonstrates how a theoretical concept — the Fourier Transform — can be applied to solve a real-world problem inspired by apps like Shazam. A sliding-window algorithm scans a full audio track, comparing the frequency spectrum of each window against a query clip using cosine similarity to detect the best match.


🗂️ Repository Structure

audio-matching-fft/
│
├── audio/
│   └── typhoon.wav              # Input audio file
│
├── output/
│   ├── full_signal_time.png
│   ├── clip_time.png
│   ├── clip_freq.png
│   ├── similarity_score.png
│   └── clip_vs_detected.png
|
├── audio_matching.py
└── README.md

⚙️ How It Works

  1. Load & Preprocess — Read a .wav file, convert stereo to mono, normalize amplitude.
  2. Time-Domain Visualization — Plot the first 10% of both the full and the clip signals.
  3. Extract Query Clip — Slice a segment using a start time and duration (in seconds).
  4. FFT Conversion — Compute the one-sided FFT magnitude spectrum of the clip.
  5. Frequency Visualization — Plot the clip's frequency spectrum.
  6. Sliding-Window Matching — Slide a window (step = clip_length / 4) (75% overlap) across the full signal; compute cosine similarity between each window's FFT and the clip's FFT.
  7. Best Match Detection — Identify the window with the highest similarity score.
  8. Visualize Results — Plot similarity score vs. time with markers for true and detected positions.
  9. Compare Signals — Overlay the original clip and the detected segment.

Similarity Metric

Cosine similarity is used to compare frequency spectra:

$$\text{similarity} = \frac{A \cdot B}{|A| \cdot |B|}$$

A score of 1.0 means a perfect match; 0.0 means no similarity.


📊 Sample Results

Console Output:

Field Value
Sampling frequency 44100 Hz
Length of full signal ~150 seconds
Clip length 30 seconds
Original clip position 5 seconds
Detected position 7.50 seconds
Best similarity score 0.9763

Full Signal — Time Domain
Full Signal Time Domain

Clip — Time Domain
Clip Time Domain

Clip — Frequency Domain
Clip Frequency Domain

Similarity Score vs. Time
Similarity Score

Clip vs. Detected Segment
Clip vs Detected


👥 Team

  • Belal Nader Youssef
  • Mariam Mohammed El Gazzaz
  • Youssef Sameh Emil Shafik

About

This repository contains a simplified audio fingerprinting system that locates a short audio clip within a longer recording using FFT-based spectral analysis and cosine similarity.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages