feat: asof_join_aligned distributed#7107
Draft
colin-ho wants to merge 2 commits into
Draft
Conversation
…vers _assume_sorted_and_aligned=True now zips input partitions by index inside AsofJoinNode instead of sampling and range-shuffling, validating partition counts at execution time. Both paths share the carryover and join-dispatch machinery. Carryovers are also fixed for by-joins (a latent bug in the shuffle path): the per-bucket top_n(1) reduction kept only the lexicographically extreme group's row, losing other groups' cross-partition matches. Reductions are now per-group extremes (window max/min of the on-key over the by-keys), and each partition's join task receives the extremes of all preceding/following partitions, so matches survive empty and group-sparse partitions. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Rust Dependency DiffHead: ✅ OK: Within budget.
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #7107 +/- ##
==========================================
- Coverage 76.29% 75.92% -0.37%
==========================================
Files 1157 1157
Lines 164609 164815 +206
==========================================
- Hits 125586 125138 -448
- Misses 39023 39677 +654
🚀 New features to boost your workflow:
|
search_nearest offers each right row only to its floor/ceil left rows, and nearest_fill skipped any row already holding a direct match — so the first left row in a gap kept the backward candidate even when forward was strictly closer (left=[593, 597], right=[577, 608]: 593 matched 577, dist 16, instead of 608, dist 15). nearest_fill now reconciles every row against the two nearest distinct neighboring matches in each direction (two levels, because duplicate on-keys share a match and would otherwise shadow the other gap endpoint), keeping the nearer per is_nearer. The aligned suite's nearest differential tests flip from xfail to hard assertions. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Replaces #7073, which GitHub permanently locked when its base branch (
euan/asof-join-aligned) was deleted on #7072's merge. Same branch, rebased onto main.Implements the shuffle-skipping execution path for
join_asof(..., _assume_sorted_and_aligned=True)against #7072's test suite.AsofJoinAlignedNode,AsofJoinNodetakes anassume_alignedflag: the aligned path zips input partitions by index (validating partition counts at execution time) and skips sampling + range shuffle entirely. Both paths share the carryover and join-dispatch machinery.top_n(1)reduction kept only one row per bucket (the lexicographically extreme(by, on)tuple), losing other groups' cross-partition matches. Carryovers are now per-group extreme rows, and each partition's join task receives the extremes of all preceding/following partitions, so matches survive empty and group-sparse partitions. feat: tests for aligned asof join #7072's ≥3-partition tests fail without this and pass with it.Note: for very high cardinality
bykeys over many partitions, shipping all per-partition extremes to every join task is O(P²·G) rows worst case; a cumulative per-group merge (O(P·G), but a sequential task chain) is a profile-first follow-up.🤖 Generated with Claude Code