Skip to content

Conversation

@mapleFU
Copy link
Member

@mapleFU mapleFU commented Nov 9, 2025

Which issue does this PR close?

Supports page index and bloom filter in parquet-concat

Rationale for this change

Supports page index and bloom filter in parquet-concat

What changes are included in this PR?

  • Supports page index and bloom filter in parquet-concat
  • Expose a Sbbf read api

Are these changes tested?

Test by commands

Are there any user-facing changes?

Might change parquet-concat behavior

@github-actions github-actions bot added the parquet Changes to the parquet crate label Nov 9, 2025
@mapleFU mapleFU requested a review from alamb November 9, 2025 06:47
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mapleFU -- this makes sense to me

it would be nice to get some tests for this parquet-concat tool to avoid regressions, but since we don't currently have any I don't think we need to add them to merge this PR

Copy link
Contributor

@etseidl etseidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mapleFU. I didn't know that append_column fixed the offset index page offsets.

While testing this I found that there's an issue in the column index writing (a missing column index gets turned into a NONE index, but then when trying to write that back out an error is thrown). I'll file an issue.

@etseidl
Copy link
Contributor

etseidl commented Nov 10, 2025

Filed #8818

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parquet-concat: supports bloom filter and page index

3 participants