-
Notifications
You must be signed in to change notification settings - Fork 1k
[Parquet] Adaptive Parquet Predicate Pushdown #8733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
The version without synthetic page: |
|
😮 thank you @hhhizzz -- I plan to review this PR carefully, but it will likely take me a few days |
|
fyi @zhuqi-lucas and @XiangpengHao |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
🤖 |
|
🤖: Benchmark completed Details
|
|
🤖 |
|
🤖: Benchmark completed Details
|
|
For how I get the the average length to use the mask, here's some statistic, you can checkout to (https://github.com/hhhizzz/arrow-rs/tree/rowselectionempty-charts) and run One column
|
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, thank you so much @hhhizzz -- I think this is really nice change and the code is well structured and a pleasure to read. Also thank you to @zhuqi-lucas for setting the stage for much of this work
Given the performance results so far (basically as good or better as the existing code) I think this PR is almost ready to go
The only thing I am not sure about is the null page / skipping thing -- I left more comments inline
I think there are several additional improvements that could be done as follow on work:
- The heuristic for when to use the masking strategy can likely be improved based on the types of values being filtered (for example the number of columns or the inclusion of StringView)
- Avoid creating
RowSelectionjust to turn it back to a BooleanArray (I left comments inline)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a quick look, whilst I think orchestrating this skipping at the RecordReader level does have a certain elegance, it runs into the issue that the masked selections aren't necessarily page-aligned.
By definition the mask selection strategy requests rows that weren't part of the original selection, the problem is that this could result in requesting rows for pages that we know are irrelevant. In some cases this just results in wasted IO, however, when using prefetching IO systems (such as AsyncParquetReader) this results in errors. The hack of creating empty pages I'm not a big fan of.
I think a better solution would be to ensure we only construct MaskChunk that don't cross page boundaries. Ideally this would be done on a per-leaf column basis, but tbh I suspect just doing it globally would probably work just fine.
Edit: If one was feeling fancy, one could ignore page boundaries where both pages were present in the original selection, although in practice I suspect this not to make a huge difference.
ad51d87 to
ed51620
Compare
|
The benchmarks are looking great. Thank you (again @hhhizzz) I have been working on the DataFusion 51 release but then I will get back to this one. I expect we'll be able to get this one in this week ❤️ |
32c06d2 to
25bcdca
Compare
Thank you ! I think the only thing need to be discussed is #8733 (comment) |
# Conflicts: # parquet/src/arrow/async_reader/mod.rs
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TLDR is thank you so much for this PR @hhhizzz -- I think it is a significant step forward
I have several ideas for how to make it better (some additional doc strings, potentially rename some things), but I think we can do them as follow on PRs (I'll make some PRs to target this branch as well so you can see what I have in mind)
| use crate::schema::types::SchemaDescriptor; | ||
|
|
||
| use crate::arrow::arrow_reader::metrics::ArrowReaderMetrics; | ||
| // Exposed so integration tests and benchmarks can temporarily override the threshold. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are other use cases too (aka I am not sure this comment is accurate anymore)
| .build(); | ||
| .build_limited(); | ||
|
|
||
| let preferred_strategy = plan_builder.preferred_selection_strategy(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Elsewhere in the PR you used the term "resolved" to reflect the chosen strategy -- perhaps we can do the same here
let resolved_strategy = plan_builder.resolve_selection_strategy();
plan_builder = plan_builder.with_selection_strategy(resolved_strategy);I also found it slightly strange that the PlanBuilder has both methods -- maybe it would be simpler if build() simply resolved the strategy directly.
However, I think we can do this as a follow on PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I choose preferred_selection_strategy because this is just one option preferred by readPlan, it will decide the final strategy after it include the page(offset) into consideration.
Here the sync reader won't do page skip so the preferred is the final result.
| } | ||
| } | ||
|
|
||
| /// Configure how row selections should be materialised during execution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice, eventually, to change RowSelection so the callers could provide a bitmap as well.
I will think about how this might look, but it is not something to change in this PR
Yes, I am trying some options to see if I can combine the RowSelectionStrategy and the RowSelection together into the public API... |
Minor: clean up of selection strategy code
Another minor cleanup for the read plan builder
| impl ReadPlan { | ||
| /// Returns a mutable reference to the selection, if any | ||
| pub fn selection_mut(&mut self) -> Option<&mut VecDeque<RowSelector>> { | ||
| pub fn selection_mut(&mut self) -> Option<&mut RowSelectionCursor> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 it seems like this is a public API (so we can't change its signature)
https://docs.rs/parquet/latest/parquet/arrow/arrow_reader/struct.ReadPlan.html#method.selection_mut
We can get around this by making a function like
pub fn row_selection_cursor_mut(&mut self) -> Option<&mut RowSelectionCursor> {And then
#[deprecated(since = "57.1.0", note = "Use `row_selection_cursor_mut` instead")]
pub fn selection_mut(&mut self) -> Option<&mut VecDeque<RowSelector>> {
...I am still playing around to see if I can make the RowSelectionCursor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose this change in
Rework RowSelectionCursor to use enums
|
Here is another PR |
|
Here are some notes to myself to file PRs for follow on work:
|
Split RowSelectionPolicy from RowSelectionStrategy
|
I plan to leave this PR open for a few more days to gather any additional comments and then merge it in I may also do some prototyping of additional optimizations (avoiding the masks and fetches entirely) for fun Thank you again @hhhizzz -- I think this PR will finally get us past the hurdle for making predicate pushdown work well for all cases |












Which issue does this PR close?
Rationale for this change
Predicate pushdown and row filters often yield very short alternating select/skip runs; materialising those runs through a
VecDeque<RowSelector>is both allocation-heavy and prone to panicswhen entire pages are skipped and never decoded. This PR introduces a mask-backed execution path for short selectors, adds the heuristics and guards needed to decide between masks and selectors,
and provides tooling to measure the trade‑offs so we can tune the threshold confidently.
What changes are included in this PR?
RowSelectionStrategyplus aRowSelectionCursorthat can iterate either a boolean mask or the legacy selector queue, along with a public guard/override so tests and benchmarks cantweak the average-selector-length heuristic.
ReadPlanBuildernow trims selections, estimates their density, and chooses the faster mask strategy when safe;ParquetRecordBatchReaderstreams mask chunks, skips contiguous gaps, filtersthe projected batch with Arrow’s boolean kernel, and still falls back to selectors when needed.
RowSelectioncan now inspect offset indexes to detect when a selection would skip entire pages; both the synchronous and asynchronous readers consult that signal so row filters no longerpanic when predicate pruning drops whole pages.
row_selection_stateCriterion benchmark plusdev/row_selection_analysis.pyto run the bench, export CSV summaries, and render comparative plots across selector lengths, columnwidths, and Utf8View payload sizes; wired the bench into
parquet/Cargo.toml.test_row_selection_interleaved_skip,test_row_selection_mask_sparse_rows,test_row_filter_full_page_skip_is_handledand itsasync twin) and updated the push-decoder size assertion to reflect the new state.
Are these changes tested?
ReadPlanBuilderthreshold tests; the Criterion bench + Python tooling provide manual validation for performancetuning. Full parquet/arrow test suites will still run in CI.
Are there any user-facing changes?
RowSelectionStrategy,RowSelectionCursor, andset_avg_selector_len_mask_thresholdfor experimentation, and developers gain the new benchmarking/plottingworkflow. No breaking API changes were introduced.