Skip to content

[ENH] M mode test for TimeXer v2#2284

Open
andersendsa wants to merge 6 commits into
sktime:mainfrom
andersendsa:M_mode
Open

[ENH] M mode test for TimeXer v2#2284
andersendsa wants to merge 6 commits into
sktime:mainfrom
andersendsa:M_mode

Conversation

@andersendsa

@andersendsa andersendsa commented May 18, 2026

Copy link
Copy Markdown
Contributor

Reference Issues/PRs

Resolves a codebase TODO.

What does this implement/fix? Explain your changes.

Added an M mode (multiple series) test to test_timexer_v2.py to resolve the actionable TODO comment requesting tests for multiple series forecasting now that the feature is successfully implemented.Improves test coverage and validates that the TimeXer v2 model correctly processes multi-target datasets.

What should a reviewer concentrate their feedback on?

  • Verify that the data format in the test aligns with the expected M mode usage.

Did you add any tests for the change?

Yes, added test_m_mode to tests/test_models/test_timexer_v2.py.

Any other comments?

PR checklist

  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.
  • Added/modified tests
  • Used pre-commit hooks when committing to ensure that code is compliant with hooks. Install hooks with pre-commit install.
    To run hooks independent of commit, execute pre-commit run --all-files

@review-notebook-app

Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@codecov

codecov Bot commented May 18, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (main@fd619aa). Learn more about missing BASE report.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2284   +/-   ##
=======================================
  Coverage        ?   87.04%           
=======================================
  Files           ?      166           
  Lines           ?     9773           
  Branches        ?        0           
=======================================
  Hits            ?     8507           
  Misses          ?     1266           
  Partials        ?        0           
Flag Coverage Δ
cpu 87.04% <ø> (?)
pytest 87.04% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@andersendsa

Copy link
Copy Markdown
Contributor Author

Hi @phoeenniixx this pr passes all the tests and is ready for review

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert this change

Comment thread tests/test_models/test_timexer_v2.py Outdated
Comment on lines +400 to +418
dataset = TimeSeries(
data=df,
time="time_idx",
target=["target1", "target2"], # M mode uses multiple targets
group=["group_id"],
num=["temperature", "humidity", "pressure", "wind_speed"],
known=["temperature", "humidity", "pressure", "wind_speed", "time_idx"],
)

data_module = TslibDataModule(
time_series_dataset=dataset,
batch_size=2,
context_length=12,
prediction_length=8,
train_val_test_split=(0.7, 0.15, 0.15),
)
data_module.setup()
metadata = data_module.metadata

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we not using pkg class here? I think we could use that and then we dont need to create these classes or anything and everything would happen under the hood, we just need to pass configs?

@phoeenniixx phoeenniixx left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
I have added some suggestions, please have a look at them

@andersendsa

Copy link
Copy Markdown
Contributor Author

Hi @phoeenniixx i have done the required changes and the pr is ready for review

Comment on lines +435 to +441
pkg.datamodule = pkg._build_datamodule(dataset)
pkg.datamodule.setup(stage="fit")

pkg._build_model(metadata=pkg.datamodule.metadata, **pkg.model_cfg)

train_dataloader = pkg.datamodule.train_dataloader()
batch = next(iter(train_dataloader))[0]

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you call pkg.fit(), pkg.predict(), you wouldnt have to create the data module etc separately. Or is this intentional?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you call pkg.fit(), pkg.predict(), you wouldnt have to create the data module etc separately. Or is this intentional?

Yes, avoiding pkg.fit() is intentional. For multiple targets (M mode), the dataloader returns targets as a list ([target1, target2]), but the underlying metric layer incorrectly unpacks lists as (target, weight), causing pkg.fit() to crash. By manually triggering the forward pass instead, we bypass this metric bug and successfully verify the model architecture handles M mode correctly.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be raised as an issue then! we should have a look at it. Can you please raise an issue and add your findings there? Please make sure to add some code to reproduce the error, so that we can diagnose it

@fkiraly fkiraly changed the title [ENH]Adding M mode test for TimeXer v2 [ENH] M mode test for TimeXer v2 May 26, 2026
@phoeenniixx

Copy link
Copy Markdown
Member

Can you please explain how we are handling M tests without the pipeline able to handle the multiloss and multi-target completelty

@andersendsa

Copy link
Copy Markdown
Contributor Author

Can you please explain how we are handling M tests without the pipeline able to handle the multiloss and multi-target completelty

Right now, the PyTorch Forecasting v2 training pipeline (i.e., pkg.fit()) crashes on M mode datasets because the metrics layer (specifically LightningMetric.update which MultiLoss uses) isn't designed to handle target being a list of multiple target tensors. It incorrectly tries to unpack [target1, target2] as (target, weight), which causes a shape/dimension mismatch.

Because we cannot train the model end-to-end with multiple targets using the built-in pkg.fit() pipeline, we have to bypass the metrics layer to test if the M mode architecture works. Here is how we are handling the M mode tests without full pipeline support:

  • Instantiate components manually (under the hood): We use TimeXer_pkg_v2 to hold the configurations, and instead of calling pkg.fit(), we explicitly call pkg._build_datamodule(dataset) and pkg._build_model(...). This gives us a fully initialized dataloader and a fully instantiated PyTorch model architecture.

  • Fetch a batch directly: We pull a single batch directly from the dataloader (batch = next(iter(train_dataloader))[0]).

  • Execute a forward pass without loss calculation: We put the model in evaluation mode (pkg.model.eval()) and execute the forward pass manually using output = pkg.model(batch) inside a torch.no_grad() context.

  • Verify the output shape: Because we bypassed the metric/loss computation completely, the model successfully produces the raw prediction tensors. We then assert that output["prediction"].shape == (batch_size, prediction_length, num_targets).

Summary: We are handling M mode testing by checking the model's core architecture and forward pass in isolation, bypassing the high-level Lightning metric/loss calculations that are currently bugged/unsupported for multi-target datasets. This proves the internal layers of TimeXer scale correctly to multiple targets, even though the surrounding training wrapper does not yet support it..

@andersendsa

andersendsa commented Jun 6, 2026

Copy link
Copy Markdown
Contributor Author

Hi @phoeenniixx are there any changes required in this pr or is this pr good to merge ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants