Skip to content

docs(checkpoint): fill in user-facing API docstrings + examples (re-land #6999)#7077

Open
rohitkulshreshtha wants to merge 1 commit into
mainfrom
rohit/docs/df-2081-reland-checkpoint-docstrings
Open

docs(checkpoint): fill in user-facing API docstrings + examples (re-land #6999)#7077
rohitkulshreshtha wants to merge 1 commit into
mainfrom
rohit/docs/df-2081-reland-checkpoint-docstrings

Conversation

@rohitkulshreshtha

@rohitkulshreshtha rohitkulshreshtha commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Re-land of #6999

#6999 was stacked on the observability branch (#6998) and merged into it — but #6998 was then closed in favor of #7026, which came from a different branch. Net effect: the docstrings never reached main despite the merged badge. This PR cherry-picks the reviewed commit onto main directly.

Original PR: #6999 (already approved + merged into the now-abandoned base branch)

Changes

  • daft.CheckpointStore: sink-side example (write_iceberg(checkpoint=)) alongside the source-side one; docstrings for list_checkpoints, get_checkpointed_files, mark_committed
  • daft.read_parquet: document checkpoint= / on=, Ray-only runner constraint, example
  • DataFrame.write_iceberg / write_deltalake: document checkpoint= with examples

🤖 Generated with Claude Code

Audit of docs.daft.ai showed the checkpoint feature was effectively
invisible — `CheckpointStore` methods had no docstrings and none of
the call sites had an example using `checkpoint=`. Fills the gaps:

* `CheckpointStore.list_checkpoints`, `get_checkpointed_files`,
  `mark_committed` get one-paragraph docstrings.
* `CheckpointStore` class docstring grows a sink-side example
  alongside the existing source-side one.
* `daft.read_parquet` gains a checkpoint-using example.
* `DataFrame.write_iceberg` and `DataFrame.write_deltalake` gain
  idempotent-commit examples showing the source/sink store-pairing.
* Each user-facing surface links to the Checkpointing user guide
  (use-case/checkpointing.md, shipped separately) via a `See Also`
  block.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown

Rust Dependency Diff

Head: 6865166b154711bdf3602771c8c0bfb579d807a7 vs Base: 9f98230c7cb0a170b1c6158210f04023e052f7a3.

OK: Within budget.

  • New Crates: 0
  • Removed Crates: 0

@github-actions github-actions Bot added the docs label Jun 4, 2026
@rohitkulshreshtha rohitkulshreshtha requested a review from ohbh June 4, 2026 20:40
@rohitkulshreshtha rohitkulshreshtha enabled auto-merge (squash) June 4, 2026 20:41
@greptile-apps

greptile-apps Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR re-lands documentation-only changes (originally in #6999) that add user-facing docstrings and inline examples for the checkpoint API to main. No functional code is changed.

  • daft/checkpoint.py: Adds a sink-side example to CheckpointStore's class docstring and new method docstrings for list_checkpoints, get_checkpointed_files, and mark_committed; fixes one arg description ("sealed" → "checkpointed").
  • daft/dataframe/dataframe.py: Extends write_iceberg and write_deltalake docstrings with idempotent-commit examples and a See Also cross-reference.
  • daft/io/_parquet.py: Adds a checkpoint store example and See Also cross-reference to read_parquet.

Confidence Score: 4/5

Documentation-only change; no functional code is touched and the docstring additions accurately describe existing behaviour.

All changes are docstrings and inline examples. The small inconsistencies in # doctest: +SKIP markers and the non-standard Raises: format are worth tidying but do not affect runtime behaviour.

daft/checkpoint.py — doctest +SKIP consistency and Raises format in mark_committed.

Important Files Changed

Filename Overview
daft/checkpoint.py Added sink-side example to CheckpointStore class docstring and new docstrings for list_checkpoints, get_checkpointed_files, and mark_committed; minor inconsistency in doctest +SKIP markers and non-standard Raises format.
daft/dataframe/dataframe.py Added checkpoint store examples to write_iceberg and write_deltalake docstrings; trailing blank line inconsistency before closing quotes in write_iceberg vs write_deltalake.
daft/io/_parquet.py Added checkpoint store example and See Also section to read_parquet; all doctest lines consistently marked with +SKIP.

Sequence Diagram

sequenceDiagram
    participant User
    participant read_parquet
    participant CheckpointStore
    participant sink as write_iceberg / write_deltalake

    User->>read_parquet: "checkpoint=CheckpointConfig(store, on="file_id")"
    read_parquet->>CheckpointStore: anti-join: skip already-committed keys
    read_parquet-->>User: filtered DataFrame

    User->>sink: "checkpoint=IdempotentCommit(store, idempotence_key)"
    sink->>CheckpointStore: get_checkpointed_files() — recover if crashed
    sink->>sink: commit to catalog (Iceberg snapshot / Delta commit)
    sink->>CheckpointStore: mark_committed(checkpoint_ids)
    sink-->>User: done
Loading

Reviews (1): Last reviewed commit: "docs(checkpoint): fill in user-facing AP..." | Re-trigger Greptile

Comment thread daft/checkpoint.py

Sink-side (idempotent commit). Pair the same store into the source and
the sink so the pipeline's checkpointed keys are visible at commit time:
>>> store = daft.CheckpointStore("s3://bucket/ckpt")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Missing # doctest: +SKIP on store construction line

The new sink-side example's first line, store = daft.CheckpointStore("s3://bucket/ckpt"), has no # doctest: +SKIP marker, while the equivalent lines in every other file changed by this PR do (_parquet.py line 73, dataframe.py lines 1310 and 1761). If this doctest runs in a context where daft is not pre-imported, the line will raise a NameError. The source-side example immediately above has the same pre-existing gap on its first two lines.

Comment thread daft/checkpoint.py
Comment on lines +113 to +115
Raises:
Errors if any id is still ``Staged`` (not yet ``checkpoint()``'d)
or unknown to the store.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The Raises: section uses free-form prose instead of the standard ExceptionType: description format expected by Sphinx and most doc-generation tooling. This prevents renderers from linking exceptions and auto-generating consistent API tables. If the exact exception type from the Rust layer isn't known, at minimum using RuntimeError or ValueError as a placeholder keeps the section parseable.

Suggested change
Raises:
Errors if any id is still ``Staged`` (not yet ``checkpoint()``'d)
or unknown to the store.
Raises:
RuntimeError: If any id is still ``Staged`` (not yet ``checkpoint()``'d)
or unknown to the store.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +1320 to 1324
See Also:
[Checkpointing guide](../use-case/checkpointing.md) for the
conceptual overview of the ``checkpoint=`` parameter.

"""

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The See Also section in write_iceberg has a trailing blank line before the closing """, while the equivalent section in write_deltalake (added in the same PR) does not. Consistent formatting keeps docstring style uniform and avoids unnecessary whitespace in rendered docs.

Suggested change
See Also:
[Checkpointing guide](../use-case/checkpointing.md) for the
conceptual overview of the ``checkpoint=`` parameter.
"""
See Also:
[Checkpointing guide](../use-case/checkpointing.md) for the
conceptual overview of the ``checkpoint=`` parameter.
"""

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@blacksmith-sh

blacksmith-sh Bot commented Jun 4, 2026

Copy link
Copy Markdown

Found 3 test failures on Blacksmith runners:

Failures

Test View Logs
test_shuffle_reproducible_with_seed[False-arrow] View Logs
test_shuffle_reproducible_with_seed[True-arrow] View Logs
coverage: platform linux, python 3.10/12-final-0 View Logs

Fix in Cursor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants