Skip to content

feat: asof_join_aligned distributed#7107

Draft
colin-ho wants to merge 2 commits into
mainfrom
euan/asof-join-aligned-distributed
Draft

feat: asof_join_aligned distributed#7107
colin-ho wants to merge 2 commits into
mainfrom
euan/asof-join-aligned-distributed

Conversation

@colin-ho

Copy link
Copy Markdown
Collaborator

Replaces #7073, which GitHub permanently locked when its base branch (euan/asof-join-aligned) was deleted on #7072's merge. Same branch, rebased onto main.

Implements the shuffle-skipping execution path for join_asof(..., _assume_sorted_and_aligned=True) against #7072's test suite.

  • Instead of a separate AsofJoinAlignedNode, AsofJoinNode takes an assume_aligned flag: the aligned path zips input partitions by index (validating partition counts at execution time) and skips sampling + range shuffle entirely. Both paths share the carryover and join-dispatch machinery.
  • Fixes a latent carryover bug in the shuffle path along the way: the per-bucket top_n(1) reduction kept only one row per bucket (the lexicographically extreme (by, on) tuple), losing other groups' cross-partition matches. Carryovers are now per-group extreme rows, and each partition's join task receives the extremes of all preceding/following partitions, so matches survive empty and group-sparse partitions. feat: tests for aligned asof join #7072's ≥3-partition tests fail without this and pass with it.
  • Re-enables the partition-count-mismatch validation test.

Note: for very high cardinality by keys over many partitions, shipping all per-partition extremes to every join task is O(P²·G) rows worst case; a cumulative per-group merge (O(P·G), but a sequential task chain) is a profile-first follow-up.

🤖 Generated with Claude Code

…vers

_assume_sorted_and_aligned=True now zips input partitions by index inside
AsofJoinNode instead of sampling and range-shuffling, validating partition
counts at execution time. Both paths share the carryover and join-dispatch
machinery.

Carryovers are also fixed for by-joins (a latent bug in the shuffle path):
the per-bucket top_n(1) reduction kept only the lexicographically extreme
group's row, losing other groups' cross-partition matches. Reductions are now
per-group extremes (window max/min of the on-key over the by-keys), and each
partition's join task receives the extremes of all preceding/following
partitions, so matches survive empty and group-sparse partitions.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown

Rust Dependency Diff

Head: 42251eb3f672eb4b882b5313a4bdd297bc13de76 vs Base: 525c393b085daddd05adbefada7672b300c096cd.

OK: Within budget.

  • New Crates: 0
  • Removed Crates: 0

@github-actions github-actions Bot added the feat label Jun 11, 2026
@codecov

codecov Bot commented Jun 11, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 2.94118% with 165 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.92%. Comparing base (f1f4dd2) to head (7583ae4).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
...ft-distributed/src/pipeline_node/join/asof_join.rs 1.19% 165 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #7107      +/-   ##
==========================================
- Coverage   76.29%   75.92%   -0.37%     
==========================================
  Files        1157     1157              
  Lines      164609   164815     +206     
==========================================
- Hits       125586   125138     -448     
- Misses      39023    39677     +654     
Files with missing lines Coverage Δ
...uted/src/pipeline_node/join/translate_asof_join.rs 97.53% <100.00%> (+0.09%) ⬆️
...ft-distributed/src/pipeline_node/join/asof_join.rs 10.07% <1.19%> (-4.28%) ⬇️

... and 40 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

search_nearest offers each right row only to its floor/ceil left rows, and
nearest_fill skipped any row already holding a direct match — so the first
left row in a gap kept the backward candidate even when forward was strictly
closer (left=[593, 597], right=[577, 608]: 593 matched 577, dist 16, instead
of 608, dist 15).

nearest_fill now reconciles every row against the two nearest distinct
neighboring matches in each direction (two levels, because duplicate on-keys
share a match and would otherwise shadow the other gap endpoint), keeping the
nearer per is_nearer. The aligned suite's nearest differential tests flip
from xfail to hard assertions.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant