- Project Overview
- Key Features
- Methodology
- Key Findings
- Technologies Used
- Project Structure
- Setup and Usage
- Conclusion
This project aims to predict U.S. economic recessions by leveraging a combination of traditional macroeconomic indicators and network analysis. The core idea is to model the economy as a complex network where indicators are nodes and their correlations are edges. By analyzing the structural properties of this network over time, we can extract novel features that may capture systemic risk and financial contagion, potentially improving the predictive power of machine learning models.
The project compares a baseline model, trained only on standard time-series features, with an enhanced model augmented by network centrality metrics. Model interpretability is explored using SHAP to understand which features are most influential in predicting economic downturns.
- Data Acquisition: Fetches a wide range of economic data from the Federal Reserve Economic Data (FRED) database and stock market data from Yahoo Finance.
- Time-Series Feature Engineering: Creates standard predictive features from raw data, including month-over-month change, year-over-year change, and 12-month rolling means and standard deviations.
- Dynamic Network Construction: Builds dynamic correlation-based networks for each month in the dataset.
- Network Feature Engineering: Calculates key centrality measures (Degree, Betweenness, Eigenvector) for each economic indicator to quantify its systemic importance over time.
- Predictive Modeling: Implements and compares Logistic Regression and XGBoost classification models to predict the probability of a recession within a 6-month horizon.
- Rigorous Evaluation: Uses a chronological train-validation-test split to prevent data leakage and evaluates models using ROC AUC, Precision, Recall, and F1-Score.
- Model Interpretation: Employs SHAP (SHapley Additive exPlanations) to analyze feature importance and understand the drivers behind model predictions.
- Visualization: Generates visualizations of the economic network at critical periods (e.g., pre-recession, during recession) and plots the evolution of network features over time.
The project is structured into a series of sequential Jupyter notebooks:
-
01_data_preprocessing_and_feature_engineering.ipynb:- Loads raw data for 12 leading economic indicators (e.g., Yield Curve, Unemployment, CPI) and the S&P 500.
- Standardizes all data to a consistent monthly frequency.
- Engineers 48 time-series features (YoY change, rolling means, etc.).
- Prepares the final clean dataset with features and the NBER recession indicator as the target.
-
02_network_construction_and_feature_engineering.ipynb:- Constructs a monthly correlation network based on a 12-month rolling window of the base indicators.
- Calculates three centrality measures (Degree, Betweenness, Eigenvector) for each feature in each monthly network.
- Creates a new dataset of 144 network-based features.
-
03_modeling_and_evaluation.ipynb:- Augments the traditional time-series features with the lagged network features.
- Splits the data chronologically into training (1961-2000), validation (2001-2010), and test (2011-2024) sets.
- Trains and evaluates two models:
- Baseline Model: Logistic Regression on traditional features only.
- Enhanced Models: Logistic Regression and XGBoost on the augmented (traditional + network) feature set.
- Saves the best-performing model (
enhanced_xgb_model.joblib) and its associated scaler.
-
04_model_interpretation_with_shap.ipynb:- Loads the saved XGBoost model.
- Performs SHAP analysis to identify the most important features in the full model.
- Conducts an experiment by removing the top traditional features and re-training the model to see if network features become more prominent.
-
05_network_visualization_and_analysis.ipynb:- Visualizes the structure of the economic network graph at critical points in time (e.g., stable periods vs. pre-recession).
- Plots the time series of key network centrality metrics against historical recession periods to observe their behavior during economic cycles.
- Model Performance: The Enhanced XGBoost model, which included network features, was the top performer. It achieved a ROC AUC of 0.95 on the validation set, significantly outperforming the baseline model.
- Feature Importance: SHAP analysis revealed that the model's predictions were primarily driven by powerful, well-known macroeconomic indicators like the OECD Leading Indicator (USALOLITONOSTSAM), Yield Curve (T10Y3MM), and Inflation (CPIAUCSL).
- Impact of Network Features: While the network-augmented dataset improved model performance, the network features themselves were not the primary drivers. They appeared to capture information that was largely redundant or less powerful than that contained in the standard time-series features.
- Final Conclusion: The project demonstrates that while network analysis provides a measurable improvement for non-linear models like XGBoost, the most critical predictors for this specific recession forecasting task remain the well-established macroeconomic variables.
- Language: Python 3.x
- Libraries:
pandas&numpyfor data manipulationscikit-learnfor modeling and preprocessingxgboostfor the gradient boosting modelfredapi&yfinancefor data acquisitionnetworkxfor graph creation and analysisshapfor model interpretationmatplotlib&seabornfor visualizationjoblibfor saving/loading modelspython-dotenvfor environment variable management
- Environment: Jupyter Notebook
Recession_Prediction_Network_Analysis/ │ ├── data/ │ ├── raw_economic_indicators.csv │ ├── final_prepared_data.csv │ ├── network_features.csv │ └── ... (other data files) │ ├── models/ │ ├── enhanced_xgb_model.joblib │ └── scaler_enhanced.joblib │ ├── visualizations/ │ ├── shap_summary_plot_FULL.png │ ├── network_dynamic_stable_period_2015-mid.png │ └── ... (other plots) │ ├── 01_data_preprocessing_and_feature_engineering.ipynb ├── 02_network_construction_and_feature_engineering.ipynb ├── 03_modeling_and_evaluation.ipynb ├── 04_model_interpretation_with_shap.ipynb ├── 05_network_visualization_and_analysis.ipynb │ ├── .env.example ├── requirements.txt └── README.md
To run this project locally, follow these steps:
-
Clone the repository:
git clone [https://github.com/your-username/Recession_Prediction_Network_Analysis.git](https://github.com/your-username/Recession_Prediction_Network_Analysis.git) cd Recession_Prediction_Network_Analysis -
Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
Install the required dependencies:
pip install -r requirements.txt
-
Set up your FRED API Key:
- Obtain a free API key from the FRED website.
- Rename the
.env.examplefile to.env. - Add your API key to the
.envfile:FRED_API_KEY="your_api_key_here"
-
Run the Jupyter Notebooks: Launch Jupyter Notebook or JupyterLab and run the notebooks in sequential order, from
01to05.jupyter notebook
This project provides a comprehensive framework for recession forecasting that integrates traditional econometrics with network science. The results indicate that while network-based features can enhance a powerful non-linear model, they do not supplant the predictive dominance of established macroeconomic indicators. This suggests that for this task, the value of network analysis may lie in providing a secondary, confirmatory signal rather than being a primary predictive tool. Future work could explore more sophisticated network construction methods or different sets of economic variables.