Skip to content

sairaawahid/Data-Science-in-Python-Data-Prep-and-EDA

Repository files navigation

Data Science in Python: Data Preparation & EDA

Transforming raw data into meaningful insight. One dataset at a time.

Overview

This repository captures my journey of building a strong foundation in data science through project-based learning in Python.

Rather than approaching data science as a collection of tools, this work focuses on developing a structured way of thinking — how to move from ambiguous problems to clear, data-driven insights.

Working as a Jr. Data Scientist on a simulated business case (Maven Music), I analyze customer data to understand patterns in churn and user behavior. Each notebook reflects not just technical execution, but the reasoning and decision-making behind it.

What This Work Represents

This repository is not just coursework. It represents:

  • Learning how to translate business problems into analytical questions
  • Developing the ability to work with imperfect, real-world data
  • Building intuition for what to explore, what to question, and what to ignore
  • Practicing how to extract signal from noise through structured analysis

Core Focus Areas

  • Understanding the Data Science Workflow
  • Supervised vs Unsupervised Learning
  • Project Scoping Techniques
  • Data Collection from Multiple Sources
  • Data Cleaning and Preprocessing using Pandas
  • Exploratory Data Analysis (EDA)
    • Filtering and grouping data
    • Data visualization
    • Correlation
  • Data Preparation for Machine Learning
    • Feature engineering
    • Dataset merging & transformation

Tools & Technologies

  • Python
  • Pandas
  • NumPy
  • Matplotlib
  • Seaborn
  • Google Colab

Course Pathway

The work in this repository follows a structured progression:

  1. Foundations of Data Science
  2. Problem Scoping & Analytical Thinking
  3. Data Gathering & Cleaning
  4. Exploratory Data Analysis
  5. Project-Based Application
  6. Preparing Data for Machine Learning

Key Outcomes

Through this work, I have developed the ability to:

  • Approach data problems with a clear analytical framework
  • Clean, structure and validate messy datasets
  • Perform EDA that leads to meaningful, actionable insights
  • Prepare datasets for downstream machine learning tasks
  • Communicate findings through structured, visual narratives

Who This Is For

  • Aspiring Data Scientists building foundational skills
  • Data Analysts transitioning into more advanced roles
  • Python learners aiming to work with real-world datasets

ⓘ Note

This repository is continuously evolving.

About

A collection of data analysis in Python projects focused on data preparation and EDA, covering real-world workflows from data gathering, cleaning to feature engineering, visualization and model preparation for machine learning.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors