An end-to-end Machine Learning solution designed to estimate real estate prices in New York City. The project moves from raw data scraping/collection to a fully functional interactive web application, helping users understand market trends and property values.
-
Model Accuracy (
$R^2$ ): 86% on unseen data. - Error Metric (MAPE): 28% Mean Absolute Percentage Error.
- Model Validation on Real-World Data (Zillow Test Cases):
| Locality | Property Type | Zip Code | Coordinates (Lat, Long) | Sqft | Baths | Beds | Actual Price (Zillow) | Model Prediction | Accuracy |
|---|---|---|---|---|---|---|---|---|---|
| Manhattan | Condo | 10021 | 40.7661, -73.9641 | 2,200 | 3 | 3 | $3,850,000 | $3,371,963 | 87.5% |
| Brooklyn | Townhouse | 11211 | 40.7105, -73.9625 | 1,150 | 2 | 2 | $1,250,000 | $1,271,603 | 98.3% |
| Queens | Multi-family | 11103 | 40.7668, -73.9161 | 950 | 1 | 2 | $875,000 | $600,413 | 68.6% |
- Feature Engineering: Implemented Haversine Distance algorithms to calculate proximity to major NYC landmarks (Times Square & Wall Street), capturing the "Location, Location, Location" factor.
- Data Pipeline: Built a robust
ColumnTransformerpipeline for automated encoding of categorical variables and scaling of numerical features. - Advanced Modeling: Leveraged
Random Forest Regressorwith Log Transformation on the target variable to handle price skewness and improve convergence. - Interactive UI: Developed a
Streamlitdashboard featuring an integratedFoliummap for real-time geographic coordinate selection.
├── app.py # Streamlit Web Application
├── data/
│ ├── raw/ # Original dataset
│ ├── cleaned/ # Cleaned dataset
│ └── processed/ # Featured engineered data ready for modeling
├── models/
│ ├── model.joblib # Trained Random Forest Model
│ └── preprocessor.joblib # Fitted Scikit-Learn Pipeline
├── notebooks/
│ ├── 01_data_loading_and_overview.ipynb
│ ├── 02_cleaning.ipynb
│ ├── 03_eda.ipynb
│ ├── 04_preprocessing
│ └── 05_modeling
├── requirements.txt # Reproducibility list
└── README.md
-
Clone the repo -> git clone https://github.com/Nourelden85/nyc-housing-prediction.git
-
Install dependencies -> pip install -r requirements.txt
-
Run the App -> streamlit run app.py