fix(deepdoc): accept GFM table separators with one or more dashes#16319
Conversation
Relax _is_table_separator_row to match valid :-- / --: rows so markdown tables are not fragmented row-by-row.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThe markdown table-separator regex now accepts one or more hyphens instead of requiring three or more, and a unit test verifies that a short-separator GFM table stays intact during extraction. ChangesGFM Short Separator Fix
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Could a maintainer add the |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/unit_test/deepdoc/parser/test_markdown_parser.py`:
- Around line 156-166: The new test in
test_custom_delimiter_preserves_gfm_short_separator_table is missing the
required pytest priority marker. Add the appropriate p1/p2/p3 marker to this
test, following the existing test suite conventions for test_markdown_parser and
other tests in test_*.py, so the new case is categorized consistently.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 13e4a568-4f14-43b5-839a-67529656b6be
📒 Files selected for processing (2)
deepdoc/parser/markdown_parser.pytest/unit_test/deepdoc/parser/test_markdown_parser.py
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #16319 +/- ##
=======================================
Coverage 93.16% 93.16%
=======================================
Files 10 10
Lines 717 717
Branches 118 118
=======================================
Hits 668 668
Misses 29 29
Partials 20 20 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Summary
_is_table_separator_rowregex from{3,}dashes to+so valid GFM separators like:--and--:are recognized.Test plan
.v/bin/python -m ruff check deepdoc/parser/markdown_parser.py test/unit_test/deepdoc/parser/test_markdown_parser.py.v/bin/python -m pytest test/unit_test/deepdoc/parser/test_markdown_parser.py::TestMarkdownElementExtractorTables::test_custom_delimiter_preserves_gfm_short_separator_table -qFixes #16314