docs(checkpoint): fill in user-facing API docstrings + examples (re-land #6999)#7077
docs(checkpoint): fill in user-facing API docstrings + examples (re-land #6999)#7077rohitkulshreshtha wants to merge 1 commit into
Conversation
Audit of docs.daft.ai showed the checkpoint feature was effectively invisible — `CheckpointStore` methods had no docstrings and none of the call sites had an example using `checkpoint=`. Fills the gaps: * `CheckpointStore.list_checkpoints`, `get_checkpointed_files`, `mark_committed` get one-paragraph docstrings. * `CheckpointStore` class docstring grows a sink-side example alongside the existing source-side one. * `daft.read_parquet` gains a checkpoint-using example. * `DataFrame.write_iceberg` and `DataFrame.write_deltalake` gain idempotent-commit examples showing the source/sink store-pairing. * Each user-facing surface links to the Checkpointing user guide (use-case/checkpointing.md, shipped separately) via a `See Also` block. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rust Dependency DiffHead: ✅ OK: Within budget.
|
Greptile SummaryThis PR re-lands documentation-only changes (originally in #6999) that add user-facing docstrings and inline examples for the checkpoint API to
Confidence Score: 4/5Documentation-only change; no functional code is touched and the docstring additions accurately describe existing behaviour. All changes are docstrings and inline examples. The small inconsistencies in daft/checkpoint.py — doctest +SKIP consistency and Raises format in mark_committed. Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant read_parquet
participant CheckpointStore
participant sink as write_iceberg / write_deltalake
User->>read_parquet: "checkpoint=CheckpointConfig(store, on="file_id")"
read_parquet->>CheckpointStore: anti-join: skip already-committed keys
read_parquet-->>User: filtered DataFrame
User->>sink: "checkpoint=IdempotentCommit(store, idempotence_key)"
sink->>CheckpointStore: get_checkpointed_files() — recover if crashed
sink->>sink: commit to catalog (Iceberg snapshot / Delta commit)
sink->>CheckpointStore: mark_committed(checkpoint_ids)
sink-->>User: done
Reviews (1): Last reviewed commit: "docs(checkpoint): fill in user-facing AP..." | Re-trigger Greptile |
|
|
||
| Sink-side (idempotent commit). Pair the same store into the source and | ||
| the sink so the pipeline's checkpointed keys are visible at commit time: | ||
| >>> store = daft.CheckpointStore("s3://bucket/ckpt") |
There was a problem hiding this comment.
Missing
# doctest: +SKIP on store construction line
The new sink-side example's first line, store = daft.CheckpointStore("s3://bucket/ckpt"), has no # doctest: +SKIP marker, while the equivalent lines in every other file changed by this PR do (_parquet.py line 73, dataframe.py lines 1310 and 1761). If this doctest runs in a context where daft is not pre-imported, the line will raise a NameError. The source-side example immediately above has the same pre-existing gap on its first two lines.
| Raises: | ||
| Errors if any id is still ``Staged`` (not yet ``checkpoint()``'d) | ||
| or unknown to the store. |
There was a problem hiding this comment.
The
Raises: section uses free-form prose instead of the standard ExceptionType: description format expected by Sphinx and most doc-generation tooling. This prevents renderers from linking exceptions and auto-generating consistent API tables. If the exact exception type from the Rust layer isn't known, at minimum using RuntimeError or ValueError as a placeholder keeps the section parseable.
| Raises: | |
| Errors if any id is still ``Staged`` (not yet ``checkpoint()``'d) | |
| or unknown to the store. | |
| Raises: | |
| RuntimeError: If any id is still ``Staged`` (not yet ``checkpoint()``'d) | |
| or unknown to the store. |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| See Also: | ||
| [Checkpointing guide](../use-case/checkpointing.md) for the | ||
| conceptual overview of the ``checkpoint=`` parameter. | ||
|
|
||
| """ |
There was a problem hiding this comment.
The
See Also section in write_iceberg has a trailing blank line before the closing """, while the equivalent section in write_deltalake (added in the same PR) does not. Consistent formatting keeps docstring style uniform and avoids unnecessary whitespace in rendered docs.
| See Also: | |
| [Checkpointing guide](../use-case/checkpointing.md) for the | |
| conceptual overview of the ``checkpoint=`` parameter. | |
| """ | |
| See Also: | |
| [Checkpointing guide](../use-case/checkpointing.md) for the | |
| conceptual overview of the ``checkpoint=`` parameter. | |
| """ |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Re-land of #6999
#6999 was stacked on the observability branch (#6998) and merged into it — but #6998 was then closed in favor of #7026, which came from a different branch. Net effect: the docstrings never reached
maindespite the merged badge. This PR cherry-picks the reviewed commit ontomaindirectly.Original PR: #6999 (already approved + merged into the now-abandoned base branch)
Changes
daft.CheckpointStore: sink-side example (write_iceberg(checkpoint=)) alongside the source-side one; docstrings forlist_checkpoints,get_checkpointed_files,mark_committeddaft.read_parquet: documentcheckpoint=/on=, Ray-only runner constraint, exampleDataFrame.write_iceberg/write_deltalake: documentcheckpoint=with examples🤖 Generated with Claude Code