Fix infinite loop in MCP document metadata cache on empty docs page#16262
Fix infinite loop in MCP document metadata cache on empty docs page#16262manan-tech wants to merge 2 commits into
Conversation
`_get_document_metadata_cache` only advanced `page` and evaluated the stop condition inside the `code == 0 and data.docs` branch. A successful response carrying an empty `docs` list therefore left `page` unchanged and refetched the same page forever, hanging MCP retrieval until the client timed out (tens of thousands of identical `/documents?page=1` requests). - break out of pagination on an empty `docs` page, and on a non-zero response code (which previously also spun forever) - send `page_size` explicitly so the termination math matches the page size the server actually applies - correct the off-by-one so the final page is included instead of being dropped as soon as `page * page_size` first reached `total` Adds unit coverage for the empty-page break, the non-zero-code break, full and partial last-page pagination, and the explicit page_size query parameter. Fixes infiniflow#16248
📝 WalkthroughWalkthroughThe Document metadata pagination fix and tests
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Suggested labels
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@mcp/server/server.py`:
- Around line 375-384: Move the call to _set_cached_document_metadata_by_dataset
outside the pagination loop so that caching occurs only after the complete
dataset has been fetched, not after each page. Currently, the cache is written
on every iteration before checking termination conditions, so if pagination
fails or breaks early (due to error or incomplete response), the partial
doc_id_meta_list already in cache persists for the full TTL and causes
subsequent lookups to silently miss data from unfetched pages. Refactor by
removing the _set_cached_document_metadata_by_dataset call from inside the loop
and placing it after the pagination completes to ensure only complete metadata
lists are cached.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 90f70c47-ec06-46a4-abb2-3c7ea73c01bd
📒 Files selected for processing (2)
mcp/server/server.pytest/unit_test/mcp/test_document_metadata_pagination.py
Previously the cache was written on every successful page inside the pagination loop. If a multi-page dataset succeeded on page 1 but a later page failed (`not docs_res` or a non-zero `code`), the loop broke after the partial list had already been cached, and that incomplete list was then served for the full TTL with the unfetched pages silently missing. Track completion and write the cache once, after the loop ends cleanly (empty page or reported total reached). Failed/early-exit runs no longer cache, so the next call re-fetches instead of serving stale partial data. Adds a regression test covering a page-2 failure: the call still returns what it fetched, but nothing is cached.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
test/unit_test/mcp/test_document_metadata_pagination.py (1)
89-154: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick winAdd pytest priority markers to the unmarked tests.
Line 89, Line 102, Line 115, Line 132, and Line 145 define tests without a
p1/p2/p3marker, while Line 156 is markedp2. Please mark these consistently so test-priority selection remains deterministic.Suggested update
+@pytest.mark.p2 `@pytest.mark.asyncio` async def test_empty_docs_page_breaks_instead_of_looping(monkeypatch, mcp_server): @@ +@pytest.mark.p2 `@pytest.mark.asyncio` async def test_nonzero_code_stops_pagination(monkeypatch, mcp_server): @@ +@pytest.mark.p2 `@pytest.mark.asyncio` async def test_paginates_through_exact_multiple_last_page(monkeypatch, mcp_server): @@ +@pytest.mark.p2 `@pytest.mark.asyncio` async def test_paginates_partial_last_page(monkeypatch, mcp_server): @@ +@pytest.mark.p2 `@pytest.mark.asyncio` async def test_request_includes_explicit_page_size(monkeypatch, mcp_server):As per coding guidelines,
**/test_*.py: Use pytest with priority level markers (p1/p2/p3) for Python tests.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/unit_test/mcp/test_document_metadata_pagination.py` around lines 89 - 154, The test function test_empty_docs_page_breaks_instead_of_looping is missing pytest priority markers while other test functions in this file (such as test_nonzero_code_stops_pagination, test_paginates_through_exact_multiple_last_page, test_paginates_partial_last_page, and test_request_includes_explicit_page_size) have `@pytest.mark.p2` decorators. Add the appropriate pytest priority marker decorators (matching the pattern used by the other tests in the file) before the test_empty_docs_page_breaks_instead_of_looping function definition to ensure consistent test-priority marking across all test functions in this file.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@test/unit_test/mcp/test_document_metadata_pagination.py`:
- Around line 89-154: The test function
test_empty_docs_page_breaks_instead_of_looping is missing pytest priority
markers while other test functions in this file (such as
test_nonzero_code_stops_pagination,
test_paginates_through_exact_multiple_last_page,
test_paginates_partial_last_page, and test_request_includes_explicit_page_size)
have `@pytest.mark.p2` decorators. Add the appropriate pytest priority marker
decorators (matching the pattern used by the other tests in the file) before the
test_empty_docs_page_breaks_instead_of_looping function definition to ensure
consistent test-priority marking across all test functions in this file.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: ee064113-0b67-40f9-a023-ed69f049b23a
📒 Files selected for processing (2)
mcp/server/server.pytest/unit_test/mcp/test_document_metadata_pagination.py
🚧 Files skipped from review as they are similar to previous changes (1)
- mcp/server/server.py
|
@coderabbitai the "outside diff range" note about missing pytest priority markers is a false positive. All six tests in |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: grep -n "pytest.mark.p2" test/unit_test/mcp/test_document_metadata_pagination.pyLength of output: 338
|
What
_get_document_metadata_cachein the MCP server can loop forever when the documents API returns a successful response (code == 0) with an emptydocslist. Thepage += 1and thepage = Nonestop condition both live inside theif code == 0 and data.docs:branch, so an empty page leavespageunchanged and the same/datasets/{id}/documents?page=1request is issued indefinitely. In a self-hosted v0.26.0 deployment this produced tens of thousands of identical requests within minutes and timed out MCP retrieval after 120s.Fixes #16248.
Changes
docspage — the core fix for the infinite loop.page_sizein the request so the termination math stays consistent with the page size the server actually applies (the documents endpoint defaults to 30; it is now explicit rather than assumed).total - page * page_size <= 0(evaluated after incrementingpage) dropped the last page — e.g. a dataset with exactly 60 documents only cached the first 30. This is the same problem the closed PR Ensure last page of documents is included in metadata cache #13141 set out to address.Tests
New
test/unit_test/mcp/test_document_metadata_pagination.py(5 cases):test_empty_docs_page_breaks_instead_of_loopingtest_nonzero_code_stops_paginationtest_paginates_through_exact_multiple_last_pagetest_paginates_partial_last_pagetest_request_includes_explicit_page_sizepage_sizeactually being sentAll 5 fail on
mainand pass on this branch:main, the empty-page and non-zero-code tests record dozens of repeatedGET /documents?page=1calls (a hard cap in the stub keeps the pre-fix loop from hanging CI), and the pagination tests cache only 30 of 60 / 30 of 45 documents;ruff checkandruff format --checkpass on the changed files.