Skip to content

UN-3578 [FIX] GDrive folder browser no longer breaks on Google Docs#2093

Open
kirtimanmishrazipstack wants to merge 3 commits into
mainfrom
UN-3578-gdrive-authentication-wf-pipeline
Open

UN-3578 [FIX] GDrive folder browser no longer breaks on Google Docs#2093
kirtimanmishrazipstack wants to merge 3 commits into
mainfrom
UN-3578-gdrive-authentication-wf-pipeline

Conversation

@kirtimanmishrazipstack

@kirtimanmishrazipstack kirtimanmishrazipstack commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

What

  • The Google Drive connector's folder browser ("Folders to process") now loads folders even when they contain Google-native files (Docs / Sheets / Slides). It previously showed "No data".

Why

  • Customers couldn't pick a Drive folder for their ETL pipeline when that folder held a Google Doc or Sheet — the folder list failed to load and blocked pipeline setup (reported on-call).

How

  • get_files() (backend/file_management/file_management_helper.py) wraps the per-file fs.info() size backfill in try/except. pydrive2's info() runs an unguarded int(fileSize)int(None) for Google-native files (no fileSize), which aborted the entire listing; on failure we now log a warning and keep the ls() entry with size 0.

Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

  • No — config-time folder-browser path only, and now strictly more resilient. Runtime ETL execution uses a separate, already-guarded path (walk()/ls()).

Database Migrations

  • None

Env Config

  • None

Relevant Docs

Related Issues or PRs

Dependencies Versions

  • None

Notes on Testing

  • Manual: in a GDrive connector, browse a folder containing a Google Doc/Sheet — folders now list instead of showing "No data".

Screenshots

Checklist

I have read and understood the Contribution Guidelines.

…ile lacks a size

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0d1a3550-2fa8-41b6-8824-47533f01c3fb

📥 Commits

Reviewing files that changed from the base of the PR and between 01fa47d and c208e1c.

📒 Files selected for processing (1)
  • backend/file_management/file_management_helper.py

Summary by CodeRabbit

  • Bug Fixes
    • Enhanced file management robustness with improved error handling for file metadata retrieval. The system now gracefully handles failures when accessing file information during directory operations, logging warnings and continuing the listing process instead of interrupting the user experience. This ensures more reliable file browsing and management operations.

Walkthrough

In FileManagerHelper.get_files, the backfill logic for files with a None size is updated to wrap the fs.info() call in a try/except. On failure, it logs a warning and sets file_info["size"] = 0, preventing the exception from aborting the directory listing.

Changes

File Size Backfill Error Handling

Layer / File(s) Summary
Guarded fs.info() with warning and size=0 fallback
backend/file_management/file_management_helper.py
When a file entry has no size, fs.info() is now called inside a try/except; failures are logged as warnings and the size defaults to 0 so the listing continues.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main fix: preventing Google Drive folder browser from breaking when folders contain Google Docs/native files.
Description check ✅ Passed The description is comprehensive and follows the template, covering all critical sections including What, Why, How, impact assessment, and testing notes with clear explanations.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch UN-3578-gdrive-authentication-wf-pipeline

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps

greptile-apps Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes a TypeError: int(None) crash in the Google Drive connector's folder browser that occurred when listing directories containing Google-native files (Docs, Sheets, Slides). The fix wraps the fs.info() size-backfill call in a try/except, falling back to size=0 and emitting a warning rather than aborting the entire directory listing.

  • Crash fix: fs.info() on a Google-native file triggers int(None) in pydrive2 since those files carry no fileSize; the new guard catches that and keeps the ls() entry with size=0, allowing the rest of the folder to render.
  • Scope: Change is confined to the folder-browser code path (get_files); ETL execution walks a separate, already-guarded path and is unaffected.

Confidence Score: 5/5

Safe to merge — the change is a minimal, well-scoped guard on a config-time folder-browser path with no impact on ETL execution.

The fix adds a single try/except around one fs.info() call, logs the caught exception with full context (as exc), and falls back to size=0. The affected code path is the folder browser only; ETL runtime walks a separate path. The exception binding and warning log give operators enough signal to distinguish a Google-native TypeError from a network or auth failure, so the change is both correct and observable.

No files require special attention.

Important Files Changed

Filename Overview
backend/file_management/file_management_helper.py Wraps fs.info() in try/except for Google-native file size backfill; falls back to size=0 with a warning log. Clean, targeted fix with exception bound to exc for observability.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[get_files called with file_path] --> B[fs.ls with detail=True]
    B --> C{For each file_info in listing}
    C --> D{file_info type == file AND size is None?}
    D -- No --> G[get ContentType]
    D -- Yes --> E[fs.info file_name]
    E -- Success --> F[file_info = new dict with size]
    E -- Exception TypeError int None or other error --> EX[logger.warning exc / file_info size = 0]
    F --> G
    EX --> G
    G --> H[mimetypes.guess_type if no ContentType]
    H --> I[Append FileInformation to file_list]
    I --> C
    C -- Done --> J[Return file_list]
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[get_files called with file_path] --> B[fs.ls with detail=True]
    B --> C{For each file_info in listing}
    C --> D{file_info type == file AND size is None?}
    D -- No --> G[get ContentType]
    D -- Yes --> E[fs.info file_name]
    E -- Success --> F[file_info = new dict with size]
    E -- Exception TypeError int None or other error --> EX[logger.warning exc / file_info size = 0]
    F --> G
    EX --> G
    G --> H[mimetypes.guess_type if no ContentType]
    H --> I[Append FileInformation to file_list]
    I --> C
    C -- Done --> J[Return file_list]
Loading

Reviews (3): Last reviewed commit: "UN-3578 [FIX] Log fs.info() failure caus..." | Re-trigger Greptile

Comment thread backend/file_management/file_management_helper.py Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/file_management/file_management_helper.py`:
- Around line 89-92: The except block is catching too broad an Exception type
without logging the actual error, which can hide unexpected bugs. Replace the
generic Exception catch with more specific exception types related to fsspec or
connector operations (such as FileNotFoundError, PermissionError, or other
fsspec-specific exceptions), and modify the logger.warning call to include the
actual exception object as part of the log message so that the root cause is
captured while still applying the fallback metadata behavior for known issues.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c35df02b-54f9-4f14-bb99-b2efbfe381e1

📥 Commits

Reviewing files that changed from the base of the PR and between 64b06a7 and 01fa47d.

📒 Files selected for processing (1)
  • backend/file_management/file_management_helper.py

Comment thread backend/file_management/file_management_helper.py Outdated
@kirtimanmishrazipstack kirtimanmishrazipstack changed the title UN-3578 [FIX] Keep GDrive folder listing alive when a Google-native f… UN-3578 [FIX] GDrive folder browser no longer breaks on Google Docs Jun 19, 2026
@sonarqubecloud

Copy link
Copy Markdown

@github-actions

Copy link
Copy Markdown
Contributor

Unstract test results

Per-group results

Status Group Tier Passed Failed Errors Skipped Duration (s)
unit-connectors unit 64 12 0 3 17.0
unit-core unit 0 0 4 0 1.2
unit-platform-service unit 9 0 1 0 1.4
unit-prompt-service unit 15 0 0 0 20.3
unit-rig unit 53 0 0 0 3.5
unit-runner unit 11 0 0 0 3.2
unit-sdk1 unit 390 0 0 0 21.1
unit-tool-registry unit 0 0 1 0 1.3
unit-workers unit 0 0 0 0 17.9
TOTAL 542 12 6 3 86.9

Critical paths

⚠️ Critical paths not yet covered

  • auth-login — User can log in and obtain a session cookie. (entry: POST /api/v1/auth/login; declared coverage: no groups declared)
  • adapter-register-llm — Register and validate an LLM adapter. (entry: POST /api/v1/adapter/; declared coverage: no groups declared)
  • workflow-create-execute — Create a workflow, configure source+destination, execute, poll, fetch result. (entry: POST /api/v1/workflow/{id}/execute/; declared coverage: e2e-workflow)
  • api-deployment-run — Deploy a workflow as an API, POST a document, receive structured JSON. (entry: POST /deployment/api/{org}/{name}/; declared coverage: e2e-api-deployment)
  • prompt-studio-fetch-response — Prompt Studio: create project, add prompt, run single-pass, get response. (entry: POST /api/v1/prompt-studio/prompt-studio-tool/{id}/fetch_response/; declared coverage: e2e-prompt-studio)
  • pipeline-etl-execute — Run an ETL pipeline from source connector to destination. (entry: POST /api/v1/pipeline/{id}/execute/; declared coverage: no groups declared)
  • usage-token-tracking — Per-execution token usage is recorded and retrievable. (entry: GET /api/v1/usage/get_token_usage/; declared coverage: no groups declared)
  • workflow-execution-fan-out — Multi-file workflow execution fans out to file-processing workers and rejoins. (entry: internal: backend → rabbitmq → workers/file_processing; declared coverage: no groups declared)
  • callback-result-delivery — Async results are posted back via the callback worker. (entry: internal: workers/callback → backend /internal endpoints; declared coverage: no groups declared)
✅ Covered critical paths
  • tool-sandbox-exec — covered by unit-runner

file_name,
exc,
)
file_info["size"] = 0

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kirtimanmishrazipstack do check if we expect a valid file size anywhere. Will 0 B show up in the UI then? Leaving it null might be better

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chandrasekharan-zipstack Checked — on this path size is display-only (serialized by FileInfoSerializer, rendered in the folder browser via formatBytes); nothing validates/sorts/gates selection on it. And null wouldn't change the display: formatBytes (GetStaticData.js:71) returns "0 B" for both 0 and null (if (!+bytes)). So 0 is harmless and keeps the size: int contract — only the stray Google-native doc shows "0 B", real files keep their size. Happy to switch to null if you'd rather the API say "unknown" explicitly, but the UI would still read "0 B" unless we also teach formatBytes to render unknown.

@chandrasekharan-zipstack chandrasekharan-zipstack left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still recommend leaving that size field blank rather than displaying 0 B. You can't distinguish between a failed file and one that is actually 0 B

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants