🚀 Real‑Time Ride Data Pipeline & Mobility Analytics Platform

A full‑stack, real‑time data engineering system built with streaming, warehousing, modeling, and analytics.

🧰 Tech Stack

Layer	Technology
Event Ingestion
Streaming Backbone	• Zookeeper
Real‑Time Processing	• aiokafka • asyncio
Warehouse
Modeling
Dashboards
Infrastructure	• docker‑compose

🧠 Overview

This project implements a real‑time mobility analytics platform that ingests streaming event data, processes it in near real time, models it in a warehouse, and exposes it through interactive dashboards.

It is designed as an end‑to‑end data engineering system, showcasing:

Event ingestion via an HTTP API
Streaming transport using Kafka
Real‑time ETL with Python and aiokafka
Analytical storage in PostgreSQL
Transformation and modeling with dbt
Visualization and storytelling with Streamlit
Fully containerized infrastructure with Docker

🔄 Architecture / Data Flow

1. Data Generator

Continuously emits synthetic mobility events (request → accept → complete, surge, fare, timestamps).
This simulates a high‑velocity event stream for testing real‑time pipelines.

2. FastAPI Ingestion Layer

Receives incoming events over HTTP and publishes them to Kafka.
Pydantic schemas enforce strict validation and schema consistency before data enters the streaming system.

3. Kafka + Zookeeper

Kafka acts as the durable, high‑throughput event log.
Events are written to the raw-rides topic, enabling scalable, decoupled processing.
Zookeeper coordinates Kafka brokers and manages metadata.

4. Stream Processor (Python + aiokafka)

Consumes events from Kafka in real time, applies operational cleaning and enrichment, and loads them into PostgreSQL.
This forms the real‑time ETL layer of the system.

5. PostgreSQL Warehouse

Serves as the analytical storage layer.
Stores:

fact_rides — raw, high‑granularity event data
dim_users, dim_locations — lookup dimensions
analytics.* — dbt‑generated models

Postgres provides ACID guarantees, strong indexing, and seamless integration with dbt and Streamlit.

6. dbt (Data Build Tool)

Transforms raw warehouse tables into analytics‑ready models.

Staging models (stg_rides) standardize and clean raw data
Mart models (fct_ride_conversion) compute business‑level metrics
Tests + documentation ensure data quality and lineage

7. Streamlit Dashboards

Interactive dashboards built directly on top of the warehouse and dbt models.

Includes:

Conversion funnel
Marketplace KPIs
Route intelligence
Surge pricing insights
Data quality monitoring

📦 Project Structure


realtime-mobility-streaming-kafka/
├── data_generator/                  # Real-time ride event generator
│   └── main.py
│
├── db/                              # Database bootstrap & reference queries
│   ├── init.sql                     # DDL + seed + initial setup for Postgres
│   └── queries/                     # Reference Queries
│       ├── 0_data_quality.sql
│       ├── 1_basic_kpis.sql
│       └── 2_funnel_analysis.sql
│
├── dbt_project/                     # dbt analytics project
│   ├── dbt_project.yml              # config
│   └── models/
│       ├── marts/                   # Business logic models
│       │   └── fct_ride_conversion.sql
│       └── staging/                 # cleaning of raw tables
│           └── stg_ride_events.sql
│
├── ingestion_api/                   # FastAPI ingestion service
│   ├── kafka_producer.py            # Kafka producer logic
│   ├── main.py                      # API endpoints (ingest → Kafka)
│   └── schemas.py                   # Pydantic validation models
│
├── pages/                           # Streamlit dashboard pages
│   ├── conversion_funnel.py
│   ├── marketplace_kpis.py
│   ├── route_intelligence.py
│   └── surge_pricing.py
│
├── stream_processor/                # Kafka consumer → Postgres writer
│   ├── processor.py                 # Real-time ETL (consume, transform, load)
│   └── transformations.py           # cleaning & enrichment logic
│
├── utils/                           # Shared utilities
│   └── db.py                        # PostgreSQL connection for dashboards
│
├── dashboard.py                     # Main Streamlit entrypoint
├── docker-compose.yml               # Kafka + Zookeeper + Postgres infra
└── requirements.txt                 # project dependencies

How to Run the Project

Start Kafka, Zookeeper, PostgreSQL, and Kafka‑UI

docker-compose up -d

Start the FastAPI ingestion service

uvicorn ingestion_api.main:app --host 0.0.0.0 --port 8000 --reload

Start the stream processor

python stream_processor/processor.py

Start the real-time data generator

python data_generator/main.py

Build analytics models with dbt

cd dbt_project/mobility_analytics
dbt run
cd ../..

Launch the Streamlit dashboard

streamlit run dashboard.py

Services Access

📘 FastAPI Swagger UI

Access the interactive API docs here: http://localhost:8000/docs

🧭 Kafka UI (Optional)

Inspect topics, partitions, and consumer lag here: http://localhost:8080

📊 Streamlit Dashboard

Open the analytics dashboard here: http://localhost:8501

🚀 Project Enhancements (Future Work)

Real‑time ML models (fraud detection, ETA prediction, anomaly detection)
Monitoring & observability with Prometheus + Grafana
CI/CD pipelines for API, processor, dbt, and dashboards
Expanded dbt models for deeper marketplace, supply‑demand, and surge analytics

👩‍💻 Author

Shweta — Full‑Stack Data Engineer
Real‑Time Streaming • Warehousing • Analytics Engineering • System Design

📄 License

This project is licensed under CC BY‑NC‑ND 4.0.
You may share it with attribution, but no commercial use and no modifications are allowed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 Real‑Time Ride Data Pipeline & Mobility Analytics Platform

🧰 Tech Stack

🧠 Overview

🔄 Architecture / Data Flow

1. Data Generator

2. FastAPI Ingestion Layer

3. Kafka + Zookeeper

4. Stream Processor (Python + aiokafka)

5. PostgreSQL Warehouse

6. dbt (Data Build Tool)

7. Streamlit Dashboards

📦 Project Structure

How to Run the Project

Services Access

📘 FastAPI Swagger UI

🧭 Kafka UI (Optional)

📊 Streamlit Dashboard

🚀 Project Enhancements (Future Work)

👩‍💻 Author

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
assets		assets
data_generator		data_generator
db		db
dbt_project/mobility_analytics		dbt_project/mobility_analytics
docker		docker
ingestion_api		ingestion_api
pages		pages
stream_processor		stream_processor
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dashboard.py		dashboard.py
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🚀 Real‑Time Ride Data Pipeline & Mobility Analytics Platform

🧰 Tech Stack

🧠 Overview

🔄 Architecture / Data Flow

1. Data Generator

2. FastAPI Ingestion Layer

3. Kafka + Zookeeper

4. Stream Processor (Python + aiokafka)

5. PostgreSQL Warehouse

6. dbt (Data Build Tool)

7. Streamlit Dashboards

📦 Project Structure

How to Run the Project

Services Access

📘 FastAPI Swagger UI

🧭 Kafka UI (Optional)

📊 Streamlit Dashboard

🚀 Project Enhancements (Future Work)

👩‍💻 Author

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages