Skip to content

GBM quantile regression is much slower than gaussian/tweedie #16867

Description

@ganeevsingh18

H2O version, Operating System and Environment

  • H2O version: 3.46.0.9
  • Local run: Ubuntu 22.04, OpenJDK 11, 20 logical CPUs, 15 GiB RAM
  • Distributed run: Docker Compose with 3 H2O nodes
  • Docker node config: 2 GB heap per node, nthreads=4 per node, OpenJDK 17
  • Not running on Kubernetes or Hadoop

I have tested this on H2O 3.46.0.9.

Actual behavior
H2OGradientBoostingEstimator(distribution="quantile") is much slower than gaussian and tweedie on the same regression dataset and near-identical GBM settings.

The slowdown appears on a single-node run and becomes much larger on a 3-node Docker H2O cluster. Predictions and metrics are consistent across the local and Docker runs, so this looks like a performance issue rather than a correctness issue.

I originally noticed this while using GBM with quantile loss on a much larger work dataset. Training was so slow that the model was almost impractical to train. I cannot share the company dataset or environment, so I created this smaller reproducible setup to demonstrate the same pattern and ask for guidance on whether this is expected or can be improved.

Summary from profiling/results/cpu_vs_docker3_comparison_superconductivity.csv (added in zip file):

model CPU JVM train seconds CPU ms/tree Docker 3-node JVM train seconds Docker 3-node ms/tree
gaussian 62 37.736 220 133.901
tweedie 59 35.393 223 133.773
quantile 166 83.000 1216 608.000

Note: gaussian and tweedie stopped before 2000 trees in these runs, while quantile built all 2000 trees. For that reason I am comparing ms_per_tree, not only total train time.

Expected behavior
I expected quantile loss to be somewhat more expensive than gaussian/tweedie, but not to show this large a per-tree gap, especially with the same data, model settings, and tree budget.

If this is expected behavior, it would be useful to understand which part of the quantile implementation makes distributed GBM training more expensive and whether there are recommended settings to reduce the overhead.

I am filing this as a performance investigation rather than a confirmed bug. If maintainers think this belongs in Discussions instead, I am happy to move or reframe it.

Steps to reproduce
The reproduction scripts and outputs are in profiling/.

Run single-node CPU:

cd profiling
python run_cpu.py

Run Docker 3-node:

cd profiling/docker_3node
./run_docker_3node.sh

Generate comparison and flamegraph tables:

cd profiling
python compare_results.py
python analyze_flamegraphs.py

All three losses use the same GBM settings except for distribution-specific parameters:

ntrees=2000
max_depth=12
learn_rate=0.005
min_rows=1
nbins=100
sample_rate=1.0
col_sample_rate=1.0
score_tree_interval=10
stopping_rounds=0

Distributions tested:

distribution="gaussian"
distribution="tweedie", tweedie_power=1.5
distribution="quantile", quantile_alpha=0.5

I can provide a zip containing the reproduction scripts, Docker files, processed train/validation/test CSVs, result summaries, and selected flamegraphs.

Upload logs
No exception is thrown. I can upload H2O logs if useful, but the most relevant artifacts are probably:

  • profiling/results/cpu_vs_docker3_comparison_superconductivity.csv
  • profiling/results/combined_training_summary_superconductivity.csv
  • profiling/results/analysis/cpu_distribution_comparison.csv
  • CPU and Docker 3-node async-profiler flamegraphs for each distribution

Screenshots
N/A

Additional context
I will include the profiling flamegraphs and summary CSVs in the reproduction zip.

From the profiling output, the expensive part is around leaf assignment and/or the quantile aggregation work that happens after rows are assigned to leaves. The strongest profiling signals are around:

profiling.zip

  • hex/quantile/Quantile$Histo
  • hex/quantile/Quantile$StratifiedQuantilesTask
  • high MRTask activity in quantile runs

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions