[ENH] M mode test for TimeXer v2 by andersendsa · Pull Request #2284 · sktime/pytorch-forecasting

andersendsa · 2026-05-18T20:43:32Z

Reference Issues/PRs

Resolves a codebase TODO.

What does this implement/fix? Explain your changes.

Added an M mode (multiple series) test to test_timexer_v2.py to resolve the actionable TODO comment requesting tests for multiple series forecasting now that the feature is successfully implemented.Improves test coverage and validates that the TimeXer v2 model correctly processes multi-target datasets.

What should a reviewer concentrate their feedback on?

Verify that the data format in the test aligns with the expected M mode usage.

Did you add any tests for the change?

Yes, added test_m_mode to tests/test_models/test_timexer_v2.py.

Any other comments?

PR checklist

The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.
Added/modified tests
Used pre-commit hooks when committing to ensure that code is compliant with hooks. Install hooks with pre-commit install.
To run hooks independent of commit, execute pre-commit run --all-files

review-notebook-app · 2026-05-18T20:55:47Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

codecov · 2026-05-18T21:42:15Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (main@fd619aa). Learn more about missing BASE report.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2284   +/-   ##
=======================================
  Coverage        ?   87.04%           
=======================================
  Files           ?      166           
  Lines           ?     9773           
  Branches        ?        0           
=======================================
  Hits            ?     8507           
  Misses          ?     1266           
  Partials        ?        0

Flag	Coverage Δ
cpu	`87.04% <ø> (?)`
pytest	`87.04% <ø> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

andersendsa · 2026-05-19T03:04:30Z

Hi @phoeenniixx this pr passes all the tests and is ready for review

phoeenniixx · 2026-05-19T13:34:57Z

Please revert this change

phoeenniixx · 2026-05-19T13:36:32Z

+        dataset = TimeSeries(
+            data=df,
+            time="time_idx",
+            target=["target1", "target2"],  # M mode uses multiple targets
+            group=["group_id"],
+            num=["temperature", "humidity", "pressure", "wind_speed"],
+            known=["temperature", "humidity", "pressure", "wind_speed", "time_idx"],
+        )
+
+        data_module = TslibDataModule(
+            time_series_dataset=dataset,
+            batch_size=2,
+            context_length=12,
+            prediction_length=8,
+            train_val_test_split=(0.7, 0.15, 0.15),
+        )
+        data_module.setup()
+        metadata = data_module.metadata
+


why are we not using pkg class here? I think we could use that and then we dont need to create these classes or anything and everything would happen under the hood, we just need to pass configs?

phoeenniixx

Thanks!
I have added some suggestions, please have a look at them

andersendsa · 2026-05-19T14:58:55Z

Hi @phoeenniixx i have done the required changes and the pr is ready for review

phoeenniixx · 2026-05-20T11:56:06Z

+    pkg.datamodule = pkg._build_datamodule(dataset)
+    pkg.datamodule.setup(stage="fit")
+
+    pkg._build_model(metadata=pkg.datamodule.metadata, **pkg.model_cfg)
+
+    train_dataloader = pkg.datamodule.train_dataloader()
+    batch = next(iter(train_dataloader))[0]


I think if you call pkg.fit(), pkg.predict(), you wouldnt have to create the data module etc separately. Or is this intentional?

I think if you call pkg.fit(), pkg.predict(), you wouldnt have to create the data module etc separately. Or is this intentional?

Yes, avoiding pkg.fit() is intentional. For multiple targets (M mode), the dataloader returns targets as a list ([target1, target2]), but the underlying metric layer incorrectly unpacks lists as (target, weight), causing pkg.fit() to crash. By manually triggering the forward pass instead, we bypass this metric bug and successfully verify the model architecture handles M mode correctly.

This should be raised as an issue then! we should have a look at it. Can you please raise an issue and add your findings there? Please make sure to add some code to reproduce the error, so that we can diagnose it

phoeenniixx · 2026-05-28T07:24:10Z

Can you please explain how we are handling M tests without the pipeline able to handle the multiloss and multi-target completelty

andersendsa · 2026-05-31T08:42:47Z

Can you please explain how we are handling M tests without the pipeline able to handle the multiloss and multi-target completelty

Right now, the PyTorch Forecasting v2 training pipeline (i.e., pkg.fit()) crashes on M mode datasets because the metrics layer (specifically LightningMetric.update which MultiLoss uses) isn't designed to handle target being a list of multiple target tensors. It incorrectly tries to unpack [target1, target2] as (target, weight), which causes a shape/dimension mismatch.

Because we cannot train the model end-to-end with multiple targets using the built-in pkg.fit() pipeline, we have to bypass the metrics layer to test if the M mode architecture works. Here is how we are handling the M mode tests without full pipeline support:

Instantiate components manually (under the hood): We use TimeXer_pkg_v2 to hold the configurations, and instead of calling pkg.fit(), we explicitly call pkg._build_datamodule(dataset) and pkg._build_model(...). This gives us a fully initialized dataloader and a fully instantiated PyTorch model architecture.
Fetch a batch directly: We pull a single batch directly from the dataloader (batch = next(iter(train_dataloader))[0]).
Execute a forward pass without loss calculation: We put the model in evaluation mode (pkg.model.eval()) and execute the forward pass manually using output = pkg.model(batch) inside a torch.no_grad() context.
Verify the output shape: Because we bypassed the metric/loss computation completely, the model successfully produces the raw prediction tensors. We then assert that output["prediction"].shape == (batch_size, prediction_length, num_targets).

Summary: We are handling M mode testing by checking the model's core architecture and forward pass in isolation, bypassing the high-level Lightning metric/loss calculations that are currently bugged/unsupported for multi-target datasets. This proves the internal layers of TimeXer scale correctly to multiple targets, even though the surrounding training wrapper does not yet support it..

andersendsa · 2026-06-06T15:44:29Z

Hi @phoeenniixx are there any changes required in this pr or is this pr good to merge ?

andersendsa requested review from PranavBhatP, benHeid, fkiraly, fnhirwa, jdb78, phoeenniixx and yarnabrina as code owners May 18, 2026 20:43

phoeenniixx reviewed May 19, 2026

View reviewed changes

Comment thread docs/source/tutorials/stallion.ipynb

phoeenniixx May 19, 2026

Copy link
Copy Markdown

Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert this change

phoeenniixx reviewed May 19, 2026

View reviewed changes

phoeenniixx requested changes May 19, 2026

View reviewed changes

phoeenniixx added enhancement New feature or request module:test_framework labels May 19, 2026

andersendsa requested a review from phoeenniixx May 19, 2026 14:58

andersendsa added 6 commits May 20, 2026 13:56

Add M mode test

7d8d1e4

fomatting

119a8be

using pkg

f893725

reverting file

81ce968

formatting

4e06364

rebase

ac43a62

andersendsa force-pushed the M_mode branch from 6c1ebd4 to ac43a62 Compare May 20, 2026 08:31

phoeenniixx reviewed May 20, 2026

View reviewed changes

andersendsa mentioned this pull request May 23, 2026

[BUG] MultiLoss crashes for multi-target datasets in M-mode due to incorrect target unpacking #2291

Open

fkiraly changed the title ~~[ENH]Adding M mode test for TimeXer v2~~ [ENH] M mode test for TimeXer v2 May 26, 2026

Uh oh!

Conversation

andersendsa commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

What should a reviewer concentrate their feedback on?

Did you add any tests for the change?

Any other comments?

PR checklist

Uh oh!

review-notebook-app Bot commented May 18, 2026

Uh oh!

codecov Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

andersendsa commented May 19, 2026

Uh oh!

phoeenniixx May 19, 2026

Choose a reason for hiding this comment

Uh oh!

phoeenniixx May 19, 2026

Choose a reason for hiding this comment

Uh oh!

phoeenniixx left a comment

Choose a reason for hiding this comment

Uh oh!

andersendsa commented May 19, 2026

Uh oh!

phoeenniixx May 20, 2026

Choose a reason for hiding this comment

Uh oh!

andersendsa May 20, 2026

Choose a reason for hiding this comment

Uh oh!

phoeenniixx May 21, 2026

Choose a reason for hiding this comment

Uh oh!

phoeenniixx commented May 28, 2026

Uh oh!

andersendsa commented May 31, 2026

Uh oh!

andersendsa commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

andersendsa commented May 18, 2026 •

edited

Loading

codecov Bot commented May 18, 2026 •

edited

Loading

andersendsa commented Jun 6, 2026 •

edited

Loading