Skip to content

Fix infinite loop in MCP document metadata cache on empty docs page#16262

Open
manan-tech wants to merge 2 commits into
infiniflow:mainfrom
manan-tech:fix/mcp-doc-metadata-empty-page-loop-16248
Open

Fix infinite loop in MCP document metadata cache on empty docs page#16262
manan-tech wants to merge 2 commits into
infiniflow:mainfrom
manan-tech:fix/mcp-doc-metadata-empty-page-loop-16248

Conversation

@manan-tech

Copy link
Copy Markdown
Contributor

What

_get_document_metadata_cache in the MCP server can loop forever when the documents API returns a successful response (code == 0) with an empty docs list. The page += 1 and the page = None stop condition both live inside the if code == 0 and data.docs: branch, so an empty page leaves page unchanged and the same /datasets/{id}/documents?page=1 request is issued indefinitely. In a self-hosted v0.26.0 deployment this produced tens of thousands of identical requests within minutes and timed out MCP retrieval after 120s.

Fixes #16248.

Changes

  • Break on an empty docs page — the core fix for the infinite loop.
  • Break on a non-zero response code (with a warning log) — this case previously also spun forever.
  • Send page_size in the request so the termination math stays consistent with the page size the server actually applies (the documents endpoint defaults to 30; it is now explicit rather than assumed).
  • Correct the off-by-one in the stop condition so the final page is included. The previous total - page * page_size <= 0 (evaluated after incrementing page) dropped the last page — e.g. a dataset with exactly 60 documents only cached the first 30. This is the same problem the closed PR Ensure last page of documents is included in metadata cache #13141 set out to address.

Tests

New test/unit_test/mcp/test_document_metadata_pagination.py (5 cases):

test guards against
test_empty_docs_page_breaks_instead_of_looping the infinite loop (#16248)
test_nonzero_code_stops_pagination a non-zero code spinning forever
test_paginates_through_exact_multiple_last_page dropping a full final page (60 docs at page_size 30)
test_paginates_partial_last_page dropping a partial final page (45 docs)
test_request_includes_explicit_page_size page_size actually being sent

All 5 fail on main and pass on this branch:

  • on main, the empty-page and non-zero-code tests record dozens of repeated GET /documents?page=1 calls (a hard cap in the stub keeps the pre-fix loop from hanging CI), and the pagination tests cache only 30 of 60 / 30 of 45 documents;
  • on this branch all 5 pass, and the full MCP unit suite (8 tests) is green.

ruff check and ruff format --check pass on the changed files.

`_get_document_metadata_cache` only advanced `page` and evaluated the
stop condition inside the `code == 0 and data.docs` branch. A successful
response carrying an empty `docs` list therefore left `page` unchanged
and refetched the same page forever, hanging MCP retrieval until the
client timed out (tens of thousands of identical `/documents?page=1`
requests).

- break out of pagination on an empty `docs` page, and on a non-zero
  response code (which previously also spun forever)
- send `page_size` explicitly so the termination math matches the page
  size the server actually applies
- correct the off-by-one so the final page is included instead of being
  dropped as soon as `page * page_size` first reached `total`

Adds unit coverage for the empty-page break, the non-zero-code break,
full and partial last-page pagination, and the explicit page_size query
parameter.

Fixes infiniflow#16248
@dosubot dosubot Bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Jun 23, 2026
@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

The _get_document_metadata_cache pagination loop in mcp/server/server.py is updated to pass an explicit page_size, break on non-zero backend code responses with a warning, stop on empty docs pages, and only write the cache after pagination completes cleanly. A new unit test module adds six async tests covering all termination and request-shape scenarios.

Document metadata pagination fix and tests

Layer / File(s) Summary
Pagination loop fix in _get_document_metadata_cache
mcp/server/server.py
Adds page_size to document requests, breaks on non-zero code with a warning, guards against infinite refetch on empty docs, replaces the off-by-one termination check with a fetched-count-meets-or-exceeds-total condition, and gates cache writes behind a pagination_complete flag.
Test infrastructure and helpers
test/unit_test/mcp/test_document_metadata_pagination.py
Dynamically loads the MCP server, provides minimal document record generation, and defines a fixture with a _stub_documents helper that monkeypatches _get to serve deterministic paginated responses with runaway-loop protection.
Pagination termination and request-shape tests
test/unit_test/mcp/test_document_metadata_pagination.py
Six async tests: empty docs stops after one request, non-zero code stops immediately, exact-multiple total fetches all pages cleanly, partial last page is fetched, outgoing requests carry page_size=30, and partial-failure scenarios return only successful pages without caching metadata.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Suggested labels

🐞 bug

Poem

🐇 Hop, hop, the loop's no more,
No fetching the same page forevermore.
page_size passed, the math corrected,
Empty docs? — the loop deflected!
Six bright tests now guard the gate,
Infinite refetch meets its fate. 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 15.38% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main fix: preventing an infinite loop in the document metadata cache when an empty docs page is returned.
Description check ✅ Passed The description comprehensively covers the problem, root cause, implementation details, and testing strategy, matching the template's requirements.
Linked Issues check ✅ Passed The PR directly addresses all requirements from issue #16248: breaks on empty docs pages, breaks on non-zero codes, includes explicit page_size, and corrects the off-by-one error.
Out of Scope Changes check ✅ Passed All code changes are within scope: they fix the pagination logic in _get_document_metadata_cache and add comprehensive unit tests validating the fixes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@mcp/server/server.py`:
- Around line 375-384: Move the call to _set_cached_document_metadata_by_dataset
outside the pagination loop so that caching occurs only after the complete
dataset has been fetched, not after each page. Currently, the cache is written
on every iteration before checking termination conditions, so if pagination
fails or breaks early (due to error or incomplete response), the partial
doc_id_meta_list already in cache persists for the full TTL and causes
subsequent lookups to silently miss data from unfetched pages. Refactor by
removing the _set_cached_document_metadata_by_dataset call from inside the loop
and placing it after the pagination completes to ensure only complete metadata
lists are cached.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 90f70c47-ec06-46a4-abb2-3c7ea73c01bd

📥 Commits

Reviewing files that changed from the base of the PR and between 11e14a8 and 53acb6b.

📒 Files selected for processing (2)
  • mcp/server/server.py
  • test/unit_test/mcp/test_document_metadata_pagination.py

Comment thread mcp/server/server.py Outdated
Previously the cache was written on every successful page inside the
pagination loop. If a multi-page dataset succeeded on page 1 but a later
page failed (`not docs_res` or a non-zero `code`), the loop broke after
the partial list had already been cached, and that incomplete list was
then served for the full TTL with the unfetched pages silently missing.

Track completion and write the cache once, after the loop ends cleanly
(empty page or reported total reached). Failed/early-exit runs no longer
cache, so the next call re-fetches instead of serving stale partial data.

Adds a regression test covering a page-2 failure: the call still returns
what it fetched, but nothing is cached.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
test/unit_test/mcp/test_document_metadata_pagination.py (1)

89-154: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Add pytest priority markers to the unmarked tests.

Line 89, Line 102, Line 115, Line 132, and Line 145 define tests without a p1/p2/p3 marker, while Line 156 is marked p2. Please mark these consistently so test-priority selection remains deterministic.

Suggested update
+@pytest.mark.p2
 `@pytest.mark.asyncio`
 async def test_empty_docs_page_breaks_instead_of_looping(monkeypatch, mcp_server):
@@
+@pytest.mark.p2
 `@pytest.mark.asyncio`
 async def test_nonzero_code_stops_pagination(monkeypatch, mcp_server):
@@
+@pytest.mark.p2
 `@pytest.mark.asyncio`
 async def test_paginates_through_exact_multiple_last_page(monkeypatch, mcp_server):
@@
+@pytest.mark.p2
 `@pytest.mark.asyncio`
 async def test_paginates_partial_last_page(monkeypatch, mcp_server):
@@
+@pytest.mark.p2
 `@pytest.mark.asyncio`
 async def test_request_includes_explicit_page_size(monkeypatch, mcp_server):

As per coding guidelines, **/test_*.py: Use pytest with priority level markers (p1/p2/p3) for Python tests.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/unit_test/mcp/test_document_metadata_pagination.py` around lines 89 -
154, The test function test_empty_docs_page_breaks_instead_of_looping is missing
pytest priority markers while other test functions in this file (such as
test_nonzero_code_stops_pagination,
test_paginates_through_exact_multiple_last_page,
test_paginates_partial_last_page, and test_request_includes_explicit_page_size)
have `@pytest.mark.p2` decorators. Add the appropriate pytest priority marker
decorators (matching the pattern used by the other tests in the file) before the
test_empty_docs_page_breaks_instead_of_looping function definition to ensure
consistent test-priority marking across all test functions in this file.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@test/unit_test/mcp/test_document_metadata_pagination.py`:
- Around line 89-154: The test function
test_empty_docs_page_breaks_instead_of_looping is missing pytest priority
markers while other test functions in this file (such as
test_nonzero_code_stops_pagination,
test_paginates_through_exact_multiple_last_page,
test_paginates_partial_last_page, and test_request_includes_explicit_page_size)
have `@pytest.mark.p2` decorators. Add the appropriate pytest priority marker
decorators (matching the pattern used by the other tests in the file) before the
test_empty_docs_page_breaks_instead_of_looping function definition to ensure
consistent test-priority marking across all test functions in this file.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ee064113-0b67-40f9-a023-ed69f049b23a

📥 Commits

Reviewing files that changed from the base of the PR and between 53acb6b and cc3d2ac.

📒 Files selected for processing (2)
  • mcp/server/server.py
  • test/unit_test/mcp/test_document_metadata_pagination.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • mcp/server/server.py

@manan-tech

Copy link
Copy Markdown
Contributor Author

@coderabbitai the "outside diff range" note about missing pytest priority markers is a false positive. All six tests in test/unit_test/mcp/test_document_metadata_pagination.py already carry @pytest.mark.p2 — at lines 87, 100, 113, 130, 143 and 156, each sitting directly above its @pytest.mark.asyncio decorator and async def. grep -c "pytest.mark.p2" returns 6 for 6 test functions, and the suite passes under --strict-markers. No change needed here.

@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

grep -n "pytest.mark.p2" test/unit_test/mcp/test_document_metadata_pagination.py

Length of output: 338



@manan-tech You're absolutely right — all six @pytest.mark.p2 markers are confirmed at lines 87, 100, 113, 130, 143, and 156. My comment was a false positive. Sorry for the noise!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MCP document metadata cache can loop forever when /documents returns an empty docs page

1 participant