Skip to content

Handle jagged NestedTensor failures in model summary#21780

Open
furk4neg3 wants to merge 1 commit into
Lightning-AI:masterfrom
furk4neg3:fix-jagged-nestedtensor-model-summary
Open

Handle jagged NestedTensor failures in model summary#21780
furk4neg3 wants to merge 1 commit into
Lightning-AI:masterfrom
furk4neg3:fix-jagged-nestedtensor-model-summary

Conversation

@furk4neg3

Copy link
Copy Markdown

What does this PR do?

Fixes #21588.

This PR handles a model summary crash when LightningModule.example_input_array is used with a model involving jagged NestedTensor operations.

In the reduced reproduction, the model can plain-forward successfully, but the same forward under PyTorch FLOP counting fails inside PyTorch NestedTensor/FLOP-counter internals. The fix keeps the behavior limited to the model summary path and does not attempt to change PyTorch NestedTensor behavior.

Changes:

  • Detects the specific PyTorch FLOP-counter + NestedTensor failure case.
  • Warns once and retries summary forwarding without FLOP counting.
  • Preserves normal model summary behavior for regular tensor inputs.
  • Re-raises genuine forward errors if the retry still fails.
  • Adds regression tests for normal tensor summaries, jagged NestedTensor fallback behavior, and genuine forward errors.
  • Skips jagged-specific tests cleanly when the required PyTorch NestedTensor support is unavailable or when future PyTorch versions support this path without failing.

No breaking changes.

Documentation update is not needed because this change only affects internal model summary fallback behavior.

Validation:

  • PYTHONPATH=src python -m pytest tests/tests_pytorch/utilities/test_model_summary.py -q
  • PYTHONPATH=src python -m pytest tests/tests_pytorch/callbacks/test_model_summary.py -q
  • PYTHONPATH=src python -m ruff check src/lightning/pytorch/utilities/model_summary/model_summary.py tests/tests_pytorch/utilities/test_model_summary.py
  • PYTHONPATH=src python -m ruff format --check src/lightning/pytorch/utilities/model_summary/model_summary.py tests/tests_pytorch/utilities/test_model_summary.py

@furk4neg3

Copy link
Copy Markdown
Author

The Read the Docs failure appears unrelated to this PR. It fails during config validation because .readthedocs.yml still uses build.os: ubuntu-20.04, but Read the Docs now expects one of ubuntu-22.04, ubuntu-24.04, ubuntu-26.04, or ubuntu-lts-latest.

I didn’t touch .readthedocs.yml in this PR. Please let me know if you’d like me to update it to ubuntu-22.04 here, or if this should be handled separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The example_input_array in LightningModule does not support using NestedTensor with layout=torch.jagged within the model.

1 participant