Skip to content

Fix flaky AND-only MATCH queries#111

Open
SnobbyDragon wants to merge 1 commit into
pinterest:pinterest-integration-4.1.0-rcfrom
SnobbyDragon:pinterest-integration-4.1.0-rc
Open

Fix flaky AND-only MATCH queries#111
SnobbyDragon wants to merge 1 commit into
pinterest:pinterest-integration-4.1.0-rcfrom
SnobbyDragon:pinterest-integration-4.1.0-rc

Conversation

@SnobbyDragon

@SnobbyDragon SnobbyDragon commented Jun 9, 2026

Copy link
Copy Markdown

Why I'm doing:

Starrocks would sometimes return Match can only used as a pushdown predicate on column with GIN in a single query. error for valid AND-only MATCH queries. The same query would also sometimes succeed.

This flaky behaviour was due to the error only occurring when a segment was empty (either initialized empty or after being pruned by ranges/bitmap/indices/sampling/etc.)

If the segment was empty, we would skip deleting the MATCH predicate during _apply_inverted_index and proceed to _rewrite_predicates, which would then see the MATCH predicate and cause the error.

What I'm doing:

Check if the segment is empty before attempting rewriting the predicates.
If the segment is empty, we don't need to rewrite because there are no rows anyways.

With this fix, it is possible that an invalid MATCH query would return 0 rows instead of raising an error.
This case would be very rare because it would require all segments to be empty after pruning, so we would never attempt predicate rewriting + raise the MATCH error. We can fix this issue in a later PR-- this current issue that has errors for valid queries is a bigger problem.

Test Plan

https://docs.google.com/document/d/14Sh09fW9gxYKOUcFi6ZnjlovPv4D5PZRV5fvr82iEuw/edit?usp=sharing
Manual testing w/ test cases ^

Inverted index integration tests pass locally (except for the one that requires multiple hosts-- this is expected to fail locally)

case num: 22
15   : sql/test_inverted_index/R/test_inverted_index:test_alter_gin_col_into_other_type ... ok
12   : sql/test_inverted_index/R/test_inverted_index:test_alter_replicated_storage ... ok
0    : sql/test_inverted_index/R/test_inverted_index:test_basic_create_index ... ok
8    : sql/test_inverted_index/R/test_inverted_index:test_clone_for_gin ... FAIL
9    : sql/test_inverted_index/R/test_inverted_index:test_complex_predicate_for_gin ... ok
14   : sql/test_inverted_index/R/test_inverted_index:test_create_mv_with_match ... ok
10   : sql/test_inverted_index/R/test_inverted_index:test_delete_and_column_prune ... ok
13   : sql/test_inverted_index/R/test_inverted_index:test_disable_global_dict_rewrite ... ok
6    : sql/test_inverted_index/R/test_inverted_index:test_gin_index_compaction ... ok
19   : sql/test_inverted_index/R/test_inverted_index:test_gin_index_csv ... ok
20   : sql/test_inverted_index/R/test_inverted_index:test_gin_index_csv_malformed ... ok
5    : sql/test_inverted_index/R/test_inverted_index:test_gin_index_multiple_predicate_english ... ok
4    : sql/test_inverted_index/R/test_inverted_index:test_gin_index_multiple_predicate_none ... ok
3    : sql/test_inverted_index/R/test_inverted_index:test_gin_index_single_predicate_english ... ok
2    : sql/test_inverted_index/R/test_inverted_index:test_gin_index_single_predicate_none ... ok
7    : sql/test_inverted_index/R/test_inverted_index:test_gin_index_type ... ok
16   : sql/test_inverted_index/R/test_inverted_index:test_gin_match_empty ... ok
17   : sql/test_inverted_index/R/test_inverted_index:test_gin_view ... ok
1    : sql/test_inverted_index/R/test_inverted_index:test_query_gin_index ... ok
21   : sql/test_inverted_index/R/test_tokenize:test_tokenize_function ... ok
11   : sql/test_inverted_index/R/test_inverted_index:test_upper_case_column_name ... ok
18   : sql/test_inverted_index/R/test_inverted_index:test_vertical_compaction ... ok
======================================================================
FAIL: 8    : sql/test_inverted_index/R/test_inverted_index:test_clone_for_gin

...
Ran 22 tests in 294.792s
FAILED (failures=1)

test_scan also succeeds

DIR]: sql/test_scan
[Mode]: VALIDATE
[file regex]: .*
[case regex]: ^test_predicate_late_materialization$
case num: 1
0    : sql/test_scan/R/test_scan_predicate_late_materialization:test_predicate_late_materialization ... ok
----------------------------------------------------------------------
Ran 1 test in 1.431s
OK

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • [] Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
    • This pr needs auto generate documentation
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 4.1
    • 4.0
    • 3.5
    • 3.4

Co-authored-by: Cursor <cursoragent@cursor.com>
@SnobbyDragon SnobbyDragon changed the title Fix flaky MATCH queries Fix flaky AND-only MATCH queries Jun 9, 2026
MeghaS14 added a commit that referenced this pull request Jun 12, 2026
Product (cc @Angela Li):
- StarRocks StarRocks#73168: OR+MATCH GIN fallback refactor (OrMatchFallbackVisitor,
  rowid buffer reuse, _wrap_or_match_predicates_for_fallback, tests).
- pinterest #108: partial update on GIN tables (SegmentFileMark + SQL test).
- pinterest #111: guard empty scan range before predicate rewrite.
- StarRocks StarRocks#68311: base OR+MATCH inverted-index fallback (StarRocks#73168 builds on).

Build / 4.1 pipeline:
- Thrift 0.23 (thirdparty/vars.sh, Java libthrift) for dev-env-ubuntu:4.1-latest.
- BE thrift_util: TOutput::instance() for Thrift 0.23.
- Java deps aligned with 4.1.0-rc01: AWS 2.29.52, Hadoop 3.4.1, Netty
  4.1.128.Final; FE and common-runtime use AWS bundle only for consistent fe/lib.
- trivy.yaml: Hadoop jar pin for scans.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants