Skip to content

JonathanMcCord/diabetes-risk-predictor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🔬 Type 2 Diabetes Risk Predictor

An interactive web application that predicts Type 2 diabetes risk from demographic, anthropometric, and clinical indicators using a Random Forest model trained on NHANES 2021 L-cycle data.

▶ Live Demo · Research Project · LinkedIn


What This Is

This Streamlit app wraps the machine learning model from my NHANES diabetes prediction research into a deployable, interactive tool. A user enters clinical values (age, BMI, waist circumference, blood pressure, cholesterol, etc.) and receives a real-time risk prediction with:

  • Risk probability displayed as a gauge chart
  • Feature importance showing which predictors drive the model
  • Individual risk profile comparing inputs to reference ranges
  • Personalized insights highlighting specific risk factors

Tech Stack

  • Python — scikit-learn, pandas, NumPy
  • Streamlit — interactive web app framework
  • Plotly — gauge chart and data visualizations
  • Model — Random Forest (500 trees, mtry = √p), trained on ~6,300 NHANES observations

Run Locally

git clone https://github.com/JonathanMcCord/diabetes-risk-predictor.git
cd diabetes-risk-predictor
pip install -r requirements.txt
python train_model.py        # Train and save model.pkl
streamlit run app.py         # Launch at http://localhost:8501

Deploy to Streamlit Community Cloud (Free)

  1. Push this repo to GitHub (public or private)
  2. Go to share.streamlit.io
  3. Sign in with GitHub and click New app
  4. Select your repo, branch main, and main file app.py
  5. Click Deploy — your app will be live in ~2 minutes

Important: Make sure model.pkl is committed to the repo (it's small enough at ~5 MB).

Repository Structure

diabetes-risk-predictor/
├── app.py               # Streamlit application
├── train_model.py       # Model training script
├── model.pkl            # Trained Random Forest model
├── requirements.txt     # Python dependencies
└── README.md            # This file

Model Details

The model mirrors the methodology from my graduate research project (DSCI 610, Ball State University):

  • Outcome: Type 2 diabetes, defined by self-reported diagnosis OR HbA1c ≥ 6.5% OR fasting glucose ≥ 126 mg/dL
  • Key design decision: HbA1c and fasting glucose were excluded from predictors to prevent data leakage
  • Top predictors: Age and waist circumference dominated across all classifiers
  • Clinical finding: Waist circumference outperformed BMI — consistent with research showing visceral adiposity as a better diabetes risk proxy

Disclaimer

This is a portfolio project for educational and demonstration purposes. It is not a medical diagnostic device and should not be used to make health decisions.


Built by Jonathan McCord · MS Data Science, Ball State University (2026)

About

diabetes-risk-predictor

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages