Reinforcement Learning for Condition-Based Maintenance of Industrial Equipment
A Reinforcement Learning MVP (Minimum Viable Product) for Condition-Based Maintenance (CBM) using industrial equipment temperature sensor data. This project implements a sophisticated QR-DQN (Quantile Regression Deep Q-Network) agent to learn optimal maintenance policies balancing risk mitigation and cost minimization.
Key Features:
- 2×2 Markov state transition model (Normal / Anomalous)
- QR-DQN with distributional RL for uncertainty quantification
- Reward design balancing risk suppression and cost minimization
- Transition matrix estimated from real measurement data
- Complete integration of high-quality base implementation (v2.0)
- 45× speedup with advanced optimizations
flowchart TB
subgraph Data["📊 Data Preprocessing"]
A1["Equipment CSV<br/>Measurement CSV"] --> A2["data_preprocessor.py"]
A2 --> A3["Statistical Threshold<br/>μ ± 2σ"]
A3 --> A4["State Classification<br/>Normal/Anomalous"]
A4 --> A5["2×2 Transition Matrix<br/>P = [[0.2948, 0.7052],<br/> [0.0731, 0.9269]]"]
end
subgraph Env["🏭 Environment"]
B1["cbm_environment.py"]
B2["Gymnasium Compatible"]
B3["3 Maintenance Scenarios<br/>· Safety First<br/>· Balanced<br/>· Cost Efficient"]
B1 --> B2
B2 --> B3
end
subgraph Train["🤖 QR-DQN Training"]
C1["train_cbm_dqn_v2.py"]
C2["QR-DQN<br/>51 quantiles"]
C3["Optimizations<br/>· PER α=0.6<br/>· N-step n=3<br/>· AMP<br/>· Parallel 16 envs<br/>· Noisy Networks"]
C4["Training Results<br/>policy_net.pth<br/>training_history.json"]
C1 --> C2
C2 --> C3
C3 --> C4
end
subgraph Viz["📈 Visualization"]
D1["visualize_results.py"]
D2["Training Curves<br/>Distribution Analysis<br/>VaR/CVaR"]
D1 --> D2
end
subgraph Compare["🔬 Scenario Comparison"]
E1["compare_scenarios.py<br/>visualize_scenarios.py"]
E2["3 Scenarios in Parallel<br/>1000 episodes each"]
E3["Comparison Results<br/>· Safety First: 8.45<br/>· Balanced: 24.31 🏆<br/>· Cost Efficient: -129.31"]
E4["Detailed Viz<br/>· Learning Curves<br/>· Per-Scenario Details<br/> (2×2 subplots)"]
E1 --> E2
E2 --> E3
E3 --> E4
end
subgraph Lessons["📚 Lessons"]
F1["Scenario_Lessons.md"]
F2["Optimal Parameters<br/>risk_weight=1.0<br/>cost_lambda=0.15"]
F3["Failure Pattern Analysis<br/>· Over-maintenance<br/>· Under-maintenance"]
F1 --> F2
F1 --> F3
end
A5 --> B1
B3 --> C1
C4 --> D1
C4 --> E1
E4 --> F1
style Data fill:#e3f2fd
style Env fill:#f3e5f5
style Train fill:#fff3e0
style Viz fill:#e8f5e9
style Compare fill:#fce4ec
style Lessons fill:#fff9c4
# Clone repository
git clone https://github.com/YOUR_USERNAME/dql-equipment-cbm.git
cd dql-equipment-cbm/equipment-cbm-mvp
# Install dependencies
pip install -r requirements.txtpython data_preprocessor.pyOutput:
- Extracts 1,843 temperature measurements from Boiler (40t)
- Automatically calculates statistical thresholds: Smin=13.02°C (μ-2σ)
- State distribution: 9.4% Normal, 90.6% Anomalous
- Estimates 2×2 transition matrix from real data
Recommended: v2.0 with Full Optimizations
python train_cbm_dqn_v2.py --episodes 1000 --n_envs 16Performance:
- Speed: 0.142 sec/episode (45× faster than v1.0)
- Optimizations: PER, N-step, AMP, Parallel Envs, Noisy Networks
- Training time: ~2 minutes for 1000 episodes
# Run all 3 scenarios and compare (~6 minutes)
python compare_scenarios.py
# Or visualize existing results
python visualize_scenarios.pypython visualize_results.py --output_dir outputs_cbm_v2 --analyze_dist --n_samples 1000Generates:
- Training curves (reward, loss, episode length)
- QR-DQN distribution statistics (VaR, CVaR, IQR)
- Policy evaluation and action distributions
- Risk profile analysis
dql-equipment-cbm/
├── equipment-cbm-mvp/
│ ├── data_preprocessor.py # CSV loading & preprocessing
│ ├── cbm_environment.py # 2×2 Markov environment (Gymnasium)
│ ├── train_cbm_dqn_v2.py # QR-DQN training (v2.0, recommended)
│ ├── visualize_results.py # Visualization with distribution analysis
│ ├── compare_scenarios.py # Multi-scenario comparison
│ ├── visualize_scenarios.py # Scenario visualization
│ ├── Scenario_Lessons.md # Detailed scenario analysis
│ ├── requirements.txt
│ ├── README.md # This file
│ └── README_JP.md # Japanese README
├── data/
│ └── private_benchmark/ # Private measurement data (excluded)
├── outputs_*/ # Training outputs (excluded)
└── .gitignore
| Scenario | Mean Reward | Final 100 Avg | Max Reward | Std Dev | Status |
|---|---|---|---|---|---|
| 🏆 Balanced | 26.36 | 24.31 | 55.00 | 26.71 | Best |
| Safety First | 5.35 | 8.45 | 25.00 | 37.38 | Unstable |
| Cost Efficient | -134.25 | -129.31 | -117.30 | 17.60 | Failed |
Key Finding: The balanced scenario (risk_weight=1.0, cost_lambda=0.15) achieves 3× better performance than safety-first, demonstrating the importance of proper risk-cost trade-off tuning.
- Training Speed: 0.142 sec/episode (45× faster than baseline)
- Improvement over Rule-based: 88% reward improvement
- GPU: CUDA-enabled for faster training
- Parallel Envs: 16 environments for efficient data collection
- Condition: 0 (Normal) or 1 (Anomalous)
- Temperature: Normalized to [0, 1]
- 0: DoNothing - Continue operation (cost: 0)
- 1: Repair - Fix equipment (cost: 3, high normal recovery)
- 2: Replace - Replace equipment (cost: 15, highest normal recovery)
R = R_risk × risk_weight + R_cost × cost_lambda
R_risk: +1 (Normal), -10 (Anomalous)
R_cost: -action_cost × cost_lambda
Input: [condition, normalized_temp]
↓
Shared Layers: [128, 64] (ReLU)
↓
├─ Value Stream: [64] → [64] → [n_quantiles]
└─ Advantage Stream: [64] → [64] → [3 × n_quantiles]
↓
Dueling: Q = V + (A - mean(A))
↓
Output: Quantile distributions for each action
-
Prioritized Experience Replay (PER)
- Priority-based sampling (α=0.6)
- Importance sampling correction (β: 0.4→1.0)
-
N-step Learning
- Multi-step bootstrapping (n=3)
- Accelerated credit assignment
-
Mixed Precision Training (AMP)
- FP16/FP32 mixed precision
- GPU memory efficiency
-
AsyncVectorEnv
- 16 parallel environments
- 16× faster data collection
-
Noisy Networks
- Parameter space exploration
- No ε-greedy needed
Target Equipment:
- Equipment: Boiler (40t), ID: 43175
- Measurement: Temperature_SouthEast_Upper_Wall②, ID: 167473
- Data points: 1,843 measurements
Transition Matrix (Estimated from Real Data):
P = [[0.2948, 0.7052], # Normal → [Normal, Anomalous]
[0.0731, 0.9269]] # Anomalous → [Normal, Anomalous]
Characteristics:
- Normal state is unstable (70% probability to Anomalous)
- Anomalous state is persistent (93% probability to continue)
- Recovery is difficult (7% probability to return to Normal)
- → Proactive maintenance intervention is crucial
- risk_weight: 1.0
- cost_lambda: 0.05
- Strategy: Proactive maintenance, minimize equipment downtime
- Result: Reward 8.45 (moderate, but unstable with high variance)
- risk_weight: 1.0
- cost_lambda: 0.15
- Strategy: Optimal balance between safety and cost
- Result: Reward 24.31 (best performance, stable)
- risk_weight: 0.3
- cost_lambda: 0.5
- Strategy: Minimize maintenance costs, tolerate equipment interruptions
- Result: Reward -129.31 (catastrophic failure)
- Optimal Lambda: 0.15 provides best risk-cost balance
- Lambda < 0.1: Over-maintenance, high variance
- Lambda > 0.3: Under-maintenance, catastrophic failure
- Low Risk Weight: Prevents learning of maintenance actions
See Scenario_Lessons.md for detailed analysis.
Each scenario generates 7 detailed plots:
- training_history.png - Training progress (reward, loss, length, distribution)
- transition_matrix.png - State transition heatmap
- policy_evaluation.png - Test episode analysis
- distribution_statistics.png - Return distribution shapes
- uncertainty_analysis.png - Variance and IQR comparison
- risk_profile.png - VaR and CVaR analysis
- quantile_distributions.png - QR-DQN quantile functions
Comparison visualizations:
- scenario_comparison.png - Learning curves and final performance
- [scenario]_detailed.png - Per-scenario 2×2 subplot analysis
Automatically handles missing threshold values:
Smin = μ - k×σ (default k=2.0)
Smax = μ + k×σQR-DQN provides rich distributional information:
- VaR (Value at Risk): 5th percentile quantile
- CVaR (Conditional Value at Risk): Expected value below VaR
- IQR (Interquartile Range): Measure of uncertainty
- Full quantile distributions: 51 quantiles per action
- Python 3.10+
- PyTorch 2.0+
- Gymnasium 1.0+
- NumPy, Pandas, Matplotlib
- CUDA-capable GPU (recommended)
See requirements.txt for complete list.
- QR-DQN: Dabney et al. "Distributional Reinforcement Learning with Quantile Regression" (AAAI 2018)
- Noisy Networks: Fortunato et al. "Noisy Networks for Exploration" (ICLR 2018)
- Dueling DQN: Wang et al. "Dueling Network Architectures for Deep Reinforcement Learning" (ICML 2016)
- PER: Schaul et al. "Prioritized Experience Replay" (ICLR 2016)
MIT License
Contributions welcome! Please feel free to submit a Pull Request.
If you use this code in your research, please cite:
@software{equipment_cbm_qrdqn,
title={Equipment CBM with QR-DQN: Reinforcement Learning for Condition-Based Maintenance},
author={Your Name},
year={2025},
url={https://github.com/YOUR_USERNAME/dql-equipment-cbm}
}- Base implementation adapted from
base_markov-dqn-v09-quantile(bridge maintenance with 3×3 transition matrix) - Equipment data provided by industrial partner (confidential)
Created: December 21, 2025 Version: 2.0