[ENH] Create a unified Test framework and a Base class for Data Modules in v2#2313
[ENH] Create a unified Test framework and a Base class for Data Modules in v2#2313Faakhir30 wants to merge 31 commits into
Conversation
Signed-off-by: Faakhir30 <zahidfaakhir@gmail.com>
Signed-off-by: Faakhir30 <zahidfaakhir@gmail.com>
Signed-off-by: Faakhir30 <zahidfaakhir@gmail.com>
Signed-off-by: Faakhir30 <zahidfaakhir@gmail.com>
Signed-off-by: Faakhir30 <zahidfaakhir@gmail.com>
Signed-off-by: Faakhir30 <zahidfaakhir@gmail.com>
Signed-off-by: Faakhir30 <zahidfaakhir@gmail.com>
Signed-off-by: Faakhir30 <zahidfaakhir@gmail.com>
Signed-off-by: Faakhir30 <zahidfaakhir@gmail.com>
Signed-off-by: Faakhir30 <zahidfaakhir@gmail.com>
Signed-off-by: Faakhir30 <zahidfaakhir@gmail.com>
Signed-off-by: Faakhir30 <zahidfaakhir@gmail.com>
Signed-off-by: Faakhir30 <zahidfaakhir@gmail.com>
Signed-off-by: Faakhir30 <zahidfaakhir@gmail.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2313 +/- ##
=======================================
Coverage ? 87.57%
=======================================
Files ? 178
Lines ? 10197
Branches ? 0
=======================================
Hits ? 8930
Misses ? 1267
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Thanks! and welcome to pytorch-forecasting.
This looks nice!
- I think we need a dataset class in
BaseDataModule? Or not? - The
setupetc methods can be data module specific, the base class is not able to handle those, which is expected. So, I think we should not remove them, or is there a way to generalise them? - I think the data set of
EncoderDecoderTimeSeriesDataModulecould also go out of the class (likeTsLibDataModule)? - Please create separate folders for base, encoder-decoder and
tsliband then add those data modules and pkg class to that. - Also, for time being, we should keep the test files for both data modules, we need to make sure we dont miss any test coverage!
| return MyDataModule | ||
|
|
||
| @classmethod | ||
| def get_test_timeseries(cls, **kwargs): |
There was a problem hiding this comment.
Why are we importing the data scenarios here? I think that should happen inside the test framework and not at the pkg level?
There was a problem hiding this comment.
sure, we can uncouple this thing, but then we will have to maintain a map of datamodule batch_format tag to a D1 TimeSeries returning function used by tests.
currently done at _datamodules_config.py
Edit: should'nt pkg layer be responsible for describing "how to test" the class it encapsulates? was above comment made because of coupling test framework code _data_senarios to pkg layer?
| """Return decoder/prediction window length.""" | ||
|
|
||
| @abstractmethod | ||
| def _create_windows(self, indices: torch.Tensor) -> list[tuple[int, int, int, int]]: |
There was a problem hiding this comment.
If we have this method, why do we have self.train_windows etc in __init__, I think this method would return the windows no? We dont need to create them at the initialization, but only when we actually need it - during data loading etc.
There was a problem hiding this comment.
why do we have self.train_windows etc in init
At init attributes are just being assigned as None to avoid hasattr checks that data modules originilay had. e.g., if not hasattr(self, "test_dataset"):
but only when we actually need it - during data loading etc.
If this comment is suggesting to avoid saving datasets and windows as attributes at all, then reason behind saving them is caching behavior of windows/datasets in setup call, we wont be able to cache creation of dataset object without saving them.
| @@ -623,100 +577,6 @@ def _create_windows(self, indices: torch.Tensor) -> list[tuple[int, int, int, in | |||
|
|
|||
| return windows | |||
|
|
|||
| def setup(self, stage: str | None = None): | |||
There was a problem hiding this comment.
Why are we removing this setup? The setup of BaseDataModule is not enough and is very generic - which is a good thing. But the children could (and should) have their own specific setup when needed..
There was a problem hiding this comment.
The setup of BaseDataModule is not enough
For tslib and enc-dec data module, I think it is sufficient apart from spliting indices logic that we have removed for now.
My knowledge is limited to these two data modules as I have only seen these two, only thing different in setup of these two is dataset creation, and I do have that smaller abstract method _build_dataset overriden in both subclases.
So, atleast in these two cases, idts subclasses needs thier own setup.
There was a problem hiding this comment.
Hmm, I see, for these classes setup generalisation worked, but that would not be the case when we add FMs to this package - FMs tend to handle data in a bit different way. But that is not an issue, we would just override this default behaviour. For now, if we are able to generalise, then it is a good thing! Nice!
There was a problem hiding this comment.
I think it is sufficient apart from spliting indices logic that we have removed for now.
I think we could create some abstract methods for splitting and windowing and then the child classes create those? Would that make sense?
I think windowing can how they are handled can be different for different types of model families (or datamodules). What do you think @agobbifbk @Faakhir30 @fkiraly
There was a problem hiding this comment.
windowing already has abstractmethod _create_windows that is implemented by subclasses.
will do same for splitting in next commit, thanks
… TS mapping in test framework Signed-off-by: Faakhir30 <zahidfaakhir@gmail.com>
Signed-off-by: Faakhir30 <zahidfaakhir@gmail.com>
Signed-off-by: Faakhir30 <zahidfaakhir@gmail.com>
Signed-off-by: Faakhir30 <zahidfaakhir@gmail.com>
Signed-off-by: Faakhir30 <zahidfaakhir@gmail.com>
Signed-off-by: Faakhir30 <zahidfaakhir@gmail.com>
…h-forecasting into data_modules_v2
Not required in current approach, i'm leaving
Addressed at #2313 (comment)
Done, tests will be failing for now as old files are having indices/windowing logic etc that we removed in newer design, thanks! |
Signed-off-by: Faakhir30 <zahidfaakhir@gmail.com>
If we are removing it from the base class, why not add them back to the child classes? |
Signed-off-by: Faakhir30 <zahidfaakhir@gmail.com>
Signed-off-by: Faakhir30 <zahidfaakhir@gmail.com>
52a1606 to
57ee73a
Compare
phoeenniixx
left a comment
There was a problem hiding this comment.
Thanks!
It is moving in a good direction!
I think we can reduce the redundancy here though. I think there are a lot of things that both these datamodules do in a very similar way. So, in the tests, they should be tested in a simiar way and under same tasks as well. In future, if some datamodule deviates from this pattern, we could add the new tests then
| ) | ||
|
|
||
|
|
||
| def make_tslib_timeseries(**kwargs): |
There was a problem hiding this comment.
why are we making two different scenarios for the data modules? I think they both should be able to handle the similar type of TimeSeries class?
Also, Is the already avaiable method to create TimeSeries not enough? Do we really need to add new methods for this?
There was a problem hiding this comment.
Also, Is the already avaiable method to create TimeSeries not enough? Do we really need to add new methods for this?
in data senarios, i dont see any function returning a single generic TimeSeries, there is one to return three timeseries (train, test, val)
I think they both should be able to handle the similar type of TimeSeries class?
originaly had two functions as test_tslib_data_module and test_data_module.py had seperate timeseries for testing, but yes, we can unify this to single timeseries, done
| EXCLUDE_DATA_MODULES = [ | ||
| "ClassName", | ||
| ] | ||
|
|
||
| EXCLUDED_TESTS = {} | ||
|
|
There was a problem hiding this comment.
would it be better to have tests:skip_by_name tag here? that is what we use for the test framework for the models
There was a problem hiding this comment.
I do like tag approach, it provides better dev experience, also original reason why we went in that direction in sktime repo. sktime/sktime#8496
-
As I read through codebase, I am seeing we are not documenting/maintaing tags in
registry/_tagsunlike sktime /skpro, is this intentional? or a future item? (i'd like to work on it if possible) -
I also see that ptf project has almost created convention of
EXCLUDED_TESTSin other pkgs testing framework too, we should addtests:skip_by_namesupport to all of them? (will open seperate PR for this, if maintainers approves, otherwise will fix for only data modules tests here)
Signed-off-by: Faakhir30 <zahidfaakhir@gmail.com>
Signed-off-by: Faakhir30 <zahidfaakhir@gmail.com>
Signed-off-by: Faakhir30 <zahidfaakhir@gmail.com>
Signed-off-by: Faakhir30 <zahidfaakhir@gmail.com>
Signed-off-by: Faakhir30 <zahidfaakhir@gmail.com>
Reference Issues/PRs
Fixes #2298
What does this implement/fix? Explain your changes.
BaseTimeSeriesDataModulein_base_data_module.pywith shared init, split, setup template, metadata cache, dataloader factories (implement private abstract methods in child classes for DM specific logic)EncoderDecoderTimeSeriesDataModuleandTslibDataModuleto inherit base_BasePtDataModule,EncoderDecoderDataModule_pkg,TslibDataModule_pkg(tags and test param factories)test_all_data_modules.py+_datamodule_config.pyusingBaseFixtureGeneratorTestAllDataModulesintest_class_register.pytest_data_module.pyandtest_tslib_data_module.pyextension_templates/v2/datamodule/stubdata_moduelesDid you add any tests for the change?
Yes,
test_all_data_modules.pyPR checklist
pre-commit install.To run hooks independent of commit, execute
pre-commit run --all-files