Skip to content

tk-yasuno/dql-equipment-cbm

Repository files navigation

Equipment CBM with QR-DQN

Reinforcement Learning for Condition-Based Maintenance of Industrial Equipment

Python PyTorch License

日本語版README

Overview

A Reinforcement Learning MVP (Minimum Viable Product) for Condition-Based Maintenance (CBM) using industrial equipment temperature sensor data. This project implements a sophisticated QR-DQN (Quantile Regression Deep Q-Network) agent to learn optimal maintenance policies balancing risk mitigation and cost minimization.

Key Features:

  • 2×2 Markov state transition model (Normal / Anomalous)
  • QR-DQN with distributional RL for uncertainty quantification
  • Reward design balancing risk suppression and cost minimization
  • Transition matrix estimated from real measurement data
  • Complete integration of high-quality base implementation (v2.0)
  • 45× speedup with advanced optimizations

System Architecture

flowchart TB
    subgraph Data["📊 Data Preprocessing"]
        A1["Equipment CSV<br/>Measurement CSV"] --> A2["data_preprocessor.py"]
        A2 --> A3["Statistical Threshold<br/>μ ± 2σ"]
        A3 --> A4["State Classification<br/>Normal/Anomalous"]
        A4 --> A5["2×2 Transition Matrix<br/>P = [[0.2948, 0.7052],<br/>     [0.0731, 0.9269]]"]
    end

    subgraph Env["🏭 Environment"]
        B1["cbm_environment.py"]
        B2["Gymnasium Compatible"]
        B3["3 Maintenance Scenarios<br/>· Safety First<br/>· Balanced<br/>· Cost Efficient"]
        B1 --> B2
        B2 --> B3
    end

    subgraph Train["🤖 QR-DQN Training"]
        C1["train_cbm_dqn_v2.py"]
        C2["QR-DQN<br/>51 quantiles"]
        C3["Optimizations<br/>· PER α=0.6<br/>· N-step n=3<br/>· AMP<br/>· Parallel 16 envs<br/>· Noisy Networks"]
        C4["Training Results<br/>policy_net.pth<br/>training_history.json"]
        C1 --> C2
        C2 --> C3
        C3 --> C4
    end

    subgraph Viz["📈 Visualization"]
        D1["visualize_results.py"]
        D2["Training Curves<br/>Distribution Analysis<br/>VaR/CVaR"]
        D1 --> D2
    end

    subgraph Compare["🔬 Scenario Comparison"]
        E1["compare_scenarios.py<br/>visualize_scenarios.py"]
        E2["3 Scenarios in Parallel<br/>1000 episodes each"]
        E3["Comparison Results<br/>· Safety First: 8.45<br/>· Balanced: 24.31 🏆<br/>· Cost Efficient: -129.31"]
        E4["Detailed Viz<br/>· Learning Curves<br/>· Per-Scenario Details<br/>  (2×2 subplots)"]
        E1 --> E2
        E2 --> E3
        E3 --> E4
    end

    subgraph Lessons["📚 Lessons"]
        F1["Scenario_Lessons.md"]
        F2["Optimal Parameters<br/>risk_weight=1.0<br/>cost_lambda=0.15"]
        F3["Failure Pattern Analysis<br/>· Over-maintenance<br/>· Under-maintenance"]
        F1 --> F2
        F1 --> F3
    end

    A5 --> B1
    B3 --> C1
    C4 --> D1
    C4 --> E1
    E4 --> F1

    style Data fill:#e3f2fd
    style Env fill:#f3e5f5
    style Train fill:#fff3e0
    style Viz fill:#e8f5e9
    style Compare fill:#fce4ec
    style Lessons fill:#fff9c4
Loading

Quick Start

Installation

# Clone repository
git clone https://github.com/YOUR_USERNAME/dql-equipment-cbm.git
cd dql-equipment-cbm/equipment-cbm-mvp

# Install dependencies
pip install -r requirements.txt

Data Preprocessing

python data_preprocessor.py

Output:

  • Extracts 1,843 temperature measurements from Boiler (40t)
  • Automatically calculates statistical thresholds: Smin=13.02°C (μ-2σ)
  • State distribution: 9.4% Normal, 90.6% Anomalous
  • Estimates 2×2 transition matrix from real data

Training

Recommended: v2.0 with Full Optimizations

python train_cbm_dqn_v2.py --episodes 1000 --n_envs 16

Performance:

  • Speed: 0.142 sec/episode (45× faster than v1.0)
  • Optimizations: PER, N-step, AMP, Parallel Envs, Noisy Networks
  • Training time: ~2 minutes for 1000 episodes

Scenario Comparison

# Run all 3 scenarios and compare (~6 minutes)
python compare_scenarios.py

# Or visualize existing results
python visualize_scenarios.py

Visualization

python visualize_results.py --output_dir outputs_cbm_v2 --analyze_dist --n_samples 1000

Generates:

  • Training curves (reward, loss, episode length)
  • QR-DQN distribution statistics (VaR, CVaR, IQR)
  • Policy evaluation and action distributions
  • Risk profile analysis

Project Structure

dql-equipment-cbm/
├── equipment-cbm-mvp/
│   ├── data_preprocessor.py       # CSV loading & preprocessing
│   ├── cbm_environment.py          # 2×2 Markov environment (Gymnasium)
│   ├── train_cbm_dqn_v2.py        # QR-DQN training (v2.0, recommended)
│   ├── visualize_results.py       # Visualization with distribution analysis
│   ├── compare_scenarios.py       # Multi-scenario comparison
│   ├── visualize_scenarios.py     # Scenario visualization
│   ├── Scenario_Lessons.md        # Detailed scenario analysis
│   ├── requirements.txt
│   ├── README.md                  # This file
│   └── README_JP.md               # Japanese README
├── data/
│   └── private_benchmark/         # Private measurement data (excluded)
├── outputs_*/                      # Training outputs (excluded)
└── .gitignore

Key Results

Scenario Comparison

Scenario Mean Reward Final 100 Avg Max Reward Std Dev Status
🏆 Balanced 26.36 24.31 55.00 26.71 Best
Safety First 5.35 8.45 25.00 37.38 Unstable
Cost Efficient -134.25 -129.31 -117.30 17.60 Failed

Key Finding: The balanced scenario (risk_weight=1.0, cost_lambda=0.15) achieves 3× better performance than safety-first, demonstrating the importance of proper risk-cost trade-off tuning.

Performance Metrics

  • Training Speed: 0.142 sec/episode (45× faster than baseline)
  • Improvement over Rule-based: 88% reward improvement
  • GPU: CUDA-enabled for faster training
  • Parallel Envs: 16 environments for efficient data collection

Technical Details

State Space

  • Condition: 0 (Normal) or 1 (Anomalous)
  • Temperature: Normalized to [0, 1]

Action Space

  • 0: DoNothing - Continue operation (cost: 0)
  • 1: Repair - Fix equipment (cost: 3, high normal recovery)
  • 2: Replace - Replace equipment (cost: 15, highest normal recovery)

Reward Function

R = R_risk × risk_weight + R_cost × cost_lambda

R_risk: +1 (Normal), -10 (Anomalous)
R_cost: -action_cost × cost_lambda

QR-DQN Architecture

Input: [condition, normalized_temp]
  ↓
Shared Layers: [128, 64] (ReLU)
  ↓
  ├─ Value Stream: [64] → [64] → [n_quantiles]
  └─ Advantage Stream: [64] → [64] → [3 × n_quantiles]
  ↓
Dueling: Q = V + (A - mean(A))
  ↓
Output: Quantile distributions for each action

Optimizations (v2.0)

  1. Prioritized Experience Replay (PER)

    • Priority-based sampling (α=0.6)
    • Importance sampling correction (β: 0.4→1.0)
  2. N-step Learning

    • Multi-step bootstrapping (n=3)
    • Accelerated credit assignment
  3. Mixed Precision Training (AMP)

    • FP16/FP32 mixed precision
    • GPU memory efficiency
  4. AsyncVectorEnv

    • 16 parallel environments
    • 16× faster data collection
  5. Noisy Networks

    • Parameter space exploration
    • No ε-greedy needed

Real Equipment Data

Target Equipment:

  • Equipment: Boiler (40t), ID: 43175
  • Measurement: Temperature_SouthEast_Upper_Wall②, ID: 167473
  • Data points: 1,843 measurements

Transition Matrix (Estimated from Real Data):

P = [[0.2948, 0.7052],   # Normal → [Normal, Anomalous]
     [0.0731, 0.9269]]   # Anomalous → [Normal, Anomalous]

Characteristics:

  • Normal state is unstable (70% probability to Anomalous)
  • Anomalous state is persistent (93% probability to continue)
  • Recovery is difficult (7% probability to return to Normal)
  • Proactive maintenance intervention is crucial

Maintenance Scenarios

1. Safety First

  • risk_weight: 1.0
  • cost_lambda: 0.05
  • Strategy: Proactive maintenance, minimize equipment downtime
  • Result: Reward 8.45 (moderate, but unstable with high variance)

2. Balanced (Recommended)

  • risk_weight: 1.0
  • cost_lambda: 0.15
  • Strategy: Optimal balance between safety and cost
  • Result: Reward 24.31 (best performance, stable)

3. Cost Efficient

  • risk_weight: 0.3
  • cost_lambda: 0.5
  • Strategy: Minimize maintenance costs, tolerate equipment interruptions
  • Result: Reward -129.31 (catastrophic failure)

Lessons Learned

  • Optimal Lambda: 0.15 provides best risk-cost balance
  • Lambda < 0.1: Over-maintenance, high variance
  • Lambda > 0.3: Under-maintenance, catastrophic failure
  • Low Risk Weight: Prevents learning of maintenance actions

See Scenario_Lessons.md for detailed analysis.

Visualization Outputs

Each scenario generates 7 detailed plots:

  1. training_history.png - Training progress (reward, loss, length, distribution)
  2. transition_matrix.png - State transition heatmap
  3. policy_evaluation.png - Test episode analysis
  4. distribution_statistics.png - Return distribution shapes
  5. uncertainty_analysis.png - Variance and IQR comparison
  6. risk_profile.png - VaR and CVaR analysis
  7. quantile_distributions.png - QR-DQN quantile functions

Comparison visualizations:

  • scenario_comparison.png - Learning curves and final performance
  • [scenario]_detailed.png - Per-scenario 2×2 subplot analysis

Advanced Features

Statistical Threshold Calculation

Automatically handles missing threshold values:

Smin = μ - k×σ  (default k=2.0)
Smax = μ + k×σ

Distribution Analysis

QR-DQN provides rich distributional information:

  • VaR (Value at Risk): 5th percentile quantile
  • CVaR (Conditional Value at Risk): Expected value below VaR
  • IQR (Interquartile Range): Measure of uncertainty
  • Full quantile distributions: 51 quantiles per action

Requirements

  • Python 3.10+
  • PyTorch 2.0+
  • Gymnasium 1.0+
  • NumPy, Pandas, Matplotlib
  • CUDA-capable GPU (recommended)

See requirements.txt for complete list.

References

  1. QR-DQN: Dabney et al. "Distributional Reinforcement Learning with Quantile Regression" (AAAI 2018)
  2. Noisy Networks: Fortunato et al. "Noisy Networks for Exploration" (ICLR 2018)
  3. Dueling DQN: Wang et al. "Dueling Network Architectures for Deep Reinforcement Learning" (ICML 2016)
  4. PER: Schaul et al. "Prioritized Experience Replay" (ICLR 2016)

License

MIT License

Contributing

Contributions welcome! Please feel free to submit a Pull Request.

Citation

If you use this code in your research, please cite:

@software{equipment_cbm_qrdqn,
  title={Equipment CBM with QR-DQN: Reinforcement Learning for Condition-Based Maintenance},
  author={Your Name},
  year={2025},
  url={https://github.com/YOUR_USERNAME/dql-equipment-cbm}
}

Acknowledgments

  • Base implementation adapted from base_markov-dqn-v09-quantile (bridge maintenance with 3×3 transition matrix)
  • Equipment data provided by industrial partner (confidential)

Created: December 21, 2025 Version: 2.0

About

A Reinforcement Learning MVP (Minimum Viable Product) for Condition-Based Maintenance (CBM) using industrial equipment temperature sensor data. This project implements a sophisticated QR-DQN (Quantile Regression Deep Q-Network) agent to learn optimal maintenance policies balancing risk mitigation and cost minimization.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages