Skip to content

Conversation

@rluvaton
Copy link
Member

@rluvaton rluvaton commented Oct 23, 2025

Which issue does this PR close?

N/A

Rationale for this change

Improve the performance of array iterator functions by skipping null checks on non nullable

What changes are included in this PR?

override ArrayIter default function implementation with one that check if there are nulls or not:

  • nth / nth_back
  • last
  • count
  • for_each
  • fold / rfold
  • all
  • any
  • find_map
  • find / rfind
  • partition
  • position / rposition

this implement all functions in Iterator/DoubleEndedIterator that the default implementation is either using some base function that we can't implement in stable (e.g. try_fold) or the implementation is naive (e.g. calling next() a lot of times)

Are these changes tested?

Extracted the tests:

Are there any user-facing changes?

Nope


Benchmarks are in:

…ull/non-nullable versions

implemented for:
- `nth` / `nth_back`
- `last`
- `count`
- `for_each`
- `fold` / `rfold`
- `all`
- `any`
- `find_map`
- `find` / `rfind`
- `partition`
- `position` / `rposition`
@github-actions github-actions bot added the arrow Changes to the arrow crate label Oct 23, 2025
rluvaton added a commit to rluvaton/arrow-rs that referenced this pull request Oct 27, 2025
…from the iterator back

this also adds a LOT of tests extracted from (which is how I found that bug):
- apache#8697
@rluvaton rluvaton marked this pull request as ready for review October 27, 2025 22:26
alamb pushed a commit that referenced this pull request Nov 3, 2025
…from the iterator back (#8728)

# Which issue does this PR close?

N/A

# Rationale for this change

for the fix: the array iterator is marked as exact size iterator and
double ended iterator so it should report the current length when
accessed through the other side

# What changes are included in this PR?

fix by using `current_end` instead of `array.len()`
and also adds a LOT of tests extracted from (which is how I found that
bug):
- #8697

# Are these changes tested?

Yes

# Are there any user-facing changes?

Kinda
@rluvaton
Copy link
Member Author

rluvaton commented Nov 3, 2025

@alamb this can now be reviewed, it is pretty simple

@alamb
Copy link
Contributor

alamb commented Nov 3, 2025

Do we have any performance benchmarks that show if this improves performance?

@rluvaton
Copy link
Member Author

rluvaton commented Nov 3, 2025

Did not run any but it should be, I will try to create benchmarks later

rluvaton added a commit to rluvaton/arrow-rs that referenced this pull request Nov 3, 2025
@rluvaton
Copy link
Member Author

rluvaton commented Nov 3, 2025

@alamb added benchmarks in:

rluvaton added a commit to rluvaton/arrow-rs that referenced this pull request Nov 3, 2025
@rluvaton
Copy link
Member Author

rluvaton commented Nov 4, 2025

I run the benchmarks and it turned out it is faster in the default implementation for some cases by a lot for some reason, but the nth, nth_back , count and last are still valuable as the default implementation just iterate IIRC

alamb pushed a commit that referenced this pull request Nov 11, 2025
… and `count` (#8785)

# Which issue does this PR close?

N/A

# Rationale for this change
The default implementations iterate over the iterator to get the value,
while we can do that in constant time

# What changes are included in this PR?
override `nth`, `nth_back`, `last` and `count`

# Are these changes tested?
existing tests in this file that I added in previous pr

# Are there any user-facing changes?
Nope


-----

Extracted from the following PR as I probably close it as it is not
faster locally in some cases:
- #8697
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants