A fullβstack, realβtime data engineering system built with streaming, warehousing, modeling, and analytics.
| Layer | Technology |
|---|---|
| Event Ingestion | |
| Streaming Backbone | |
| RealβTime Processing | |
| Warehouse | |
| Modeling | |
| Dashboards | |
| Infrastructure |
This project implements a realβtime mobility analytics platform that ingests streaming event data, processes it in near real time, models it in a warehouse, and exposes it through interactive dashboards.
It is designed as an endβtoβend data engineering system, showcasing:
- Event ingestion via an HTTP API
- Streaming transport using Kafka
- Realβtime ETL with Python and aiokafka
- Analytical storage in PostgreSQL
- Transformation and modeling with dbt
- Visualization and storytelling with Streamlit
- Fully containerized infrastructure with Docker
Continuously emits synthetic mobility events (request β accept β complete, surge, fare, timestamps).
This simulates a highβvelocity event stream for testing realβtime pipelines.
Receives incoming events over HTTP and publishes them to Kafka.
Pydantic schemas enforce strict validation and schema consistency before data enters the streaming system.
Kafka acts as the durable, highβthroughput event log.
Events are written to the raw-rides topic, enabling scalable, decoupled processing.
Zookeeper coordinates Kafka brokers and manages metadata.
Consumes events from Kafka in real time, applies operational cleaning and enrichment, and loads them into PostgreSQL.
This forms the realβtime ETL layer of the system.
Serves as the analytical storage layer.
Stores:
fact_ridesβ raw, highβgranularity event datadim_users,dim_locationsβ lookup dimensionsanalytics.*β dbtβgenerated models
Postgres provides ACID guarantees, strong indexing, and seamless integration with dbt and Streamlit.
Transforms raw warehouse tables into analyticsβready models.
- Staging models (
stg_rides) standardize and clean raw data - Mart models (
fct_ride_conversion) compute businessβlevel metrics - Tests + documentation ensure data quality and lineage
Interactive dashboards built directly on top of the warehouse and dbt models.
Includes:
- Conversion funnel
- Marketplace KPIs
- Route intelligence
- Surge pricing insights
- Data quality monitoring
realtime-mobility-streaming-kafka/
βββ data_generator/ # Real-time ride event generator
β βββ main.py
β
βββ db/ # Database bootstrap & reference queries
β βββ init.sql # DDL + seed + initial setup for Postgres
β βββ queries/ # Reference Queries
β βββ 0_data_quality.sql
β βββ 1_basic_kpis.sql
β βββ 2_funnel_analysis.sql
β
βββ dbt_project/ # dbt analytics project
β βββ dbt_project.yml # config
β βββ models/
β βββ marts/ # Business logic models
β β βββ fct_ride_conversion.sql
β βββ staging/ # cleaning of raw tables
β βββ stg_ride_events.sql
β
βββ ingestion_api/ # FastAPI ingestion service
β βββ kafka_producer.py # Kafka producer logic
β βββ main.py # API endpoints (ingest β Kafka)
β βββ schemas.py # Pydantic validation models
β
βββ pages/ # Streamlit dashboard pages
β βββ conversion_funnel.py
β βββ marketplace_kpis.py
β βββ route_intelligence.py
β βββ surge_pricing.py
β
βββ stream_processor/ # Kafka consumer β Postgres writer
β βββ processor.py # Real-time ETL (consume, transform, load)
β βββ transformations.py # cleaning & enrichment logic
β
βββ utils/ # Shared utilities
β βββ db.py # PostgreSQL connection for dashboards
β
βββ dashboard.py # Main Streamlit entrypoint
βββ docker-compose.yml # Kafka + Zookeeper + Postgres infra
βββ requirements.txt # project dependencies
- Start Kafka, Zookeeper, PostgreSQL, and KafkaβUI
docker-compose up -d- Start the FastAPI ingestion service
uvicorn ingestion_api.main:app --host 0.0.0.0 --port 8000 --reload- Start the stream processor
python stream_processor/processor.py- Start the real-time data generator
python data_generator/main.py- Build analytics models with dbt
cd dbt_project/mobility_analytics
dbt run
cd ../..- Launch the Streamlit dashboard
streamlit run dashboard.pyAccess the interactive API docs here: http://localhost:8000/docs
Inspect topics, partitions, and consumer lag here: http://localhost:8080
Open the analytics dashboard here: http://localhost:8501
- Realβtime ML models (fraud detection, ETA prediction, anomaly detection)
- Monitoring & observability with Prometheus + Grafana
- CI/CD pipelines for API, processor, dbt, and dashboards
- Expanded dbt models for deeper marketplace, supplyβdemand, and surge analytics
Shweta β FullβStack Data Engineer
RealβTime Streaming β’ Warehousing β’ Analytics Engineering β’ System Design
This project is licensed under CC BYβNCβND 4.0.
You may share it with attribution, but no commercial use and no modifications are allowed.



