An interactive web application that predicts Type 2 diabetes risk from demographic, anthropometric, and clinical indicators using a Random Forest model trained on NHANES 2021 L-cycle data.
▶ Live Demo · Research Project · LinkedIn
This Streamlit app wraps the machine learning model from my NHANES diabetes prediction research into a deployable, interactive tool. A user enters clinical values (age, BMI, waist circumference, blood pressure, cholesterol, etc.) and receives a real-time risk prediction with:
- Risk probability displayed as a gauge chart
- Feature importance showing which predictors drive the model
- Individual risk profile comparing inputs to reference ranges
- Personalized insights highlighting specific risk factors
- Python — scikit-learn, pandas, NumPy
- Streamlit — interactive web app framework
- Plotly — gauge chart and data visualizations
- Model — Random Forest (500 trees, mtry = √p), trained on ~6,300 NHANES observations
git clone https://github.com/JonathanMcCord/diabetes-risk-predictor.git
cd diabetes-risk-predictor
pip install -r requirements.txt
python train_model.py # Train and save model.pkl
streamlit run app.py # Launch at http://localhost:8501- Push this repo to GitHub (public or private)
- Go to share.streamlit.io
- Sign in with GitHub and click New app
- Select your repo, branch
main, and main fileapp.py - Click Deploy — your app will be live in ~2 minutes
Important: Make sure
model.pklis committed to the repo (it's small enough at ~5 MB).
diabetes-risk-predictor/
├── app.py # Streamlit application
├── train_model.py # Model training script
├── model.pkl # Trained Random Forest model
├── requirements.txt # Python dependencies
└── README.md # This file
The model mirrors the methodology from my graduate research project (DSCI 610, Ball State University):
- Outcome: Type 2 diabetes, defined by self-reported diagnosis OR HbA1c ≥ 6.5% OR fasting glucose ≥ 126 mg/dL
- Key design decision: HbA1c and fasting glucose were excluded from predictors to prevent data leakage
- Top predictors: Age and waist circumference dominated across all classifiers
- Clinical finding: Waist circumference outperformed BMI — consistent with research showing visceral adiposity as a better diabetes risk proxy
This is a portfolio project for educational and demonstration purposes. It is not a medical diagnostic device and should not be used to make health decisions.
Built by Jonathan McCord · MS Data Science, Ball State University (2026)