-
Notifications
You must be signed in to change notification settings - Fork 1k
perf: override default implementation in ArrayIter with dedicated null/non-nullable versions
#8697
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
perf: override default implementation in ArrayIter with dedicated null/non-nullable versions
#8697
Conversation
…ull/non-nullable versions implemented for: - `nth` / `nth_back` - `last` - `count` - `for_each` - `fold` / `rfold` - `all` - `any` - `find_map` - `find` / `rfind` - `partition` - `position` / `rposition`
…from the iterator back this also adds a LOT of tests extracted from (which is how I found that bug): - apache#8697
…from the iterator back (#8728) # Which issue does this PR close? N/A # Rationale for this change for the fix: the array iterator is marked as exact size iterator and double ended iterator so it should report the current length when accessed through the other side # What changes are included in this PR? fix by using `current_end` instead of `array.len()` and also adds a LOT of tests extracted from (which is how I found that bug): - #8697 # Are these changes tested? Yes # Are there any user-facing changes? Kinda
# Conflicts: # arrow-array/src/iterator.rs
|
@alamb this can now be reviewed, it is pretty simple |
|
Do we have any performance benchmarks that show if this improves performance? |
|
Did not run any but it should be, I will try to create benchmarks later |
Needed for: - apache#8697
|
@alamb added benchmarks in: |
Needed for: - apache#8697
|
I run the benchmarks and it turned out it is faster in the default implementation for some cases by a lot for some reason, but the |
… and `count` (#8785) # Which issue does this PR close? N/A # Rationale for this change The default implementations iterate over the iterator to get the value, while we can do that in constant time # What changes are included in this PR? override `nth`, `nth_back`, `last` and `count` # Are these changes tested? existing tests in this file that I added in previous pr # Are there any user-facing changes? Nope ----- Extracted from the following PR as I probably close it as it is not faster locally in some cases: - #8697
Which issue does this PR close?
N/A
Rationale for this change
Improve the performance of array iterator functions by skipping null checks on non nullable
What changes are included in this PR?
override
ArrayIterdefault function implementation with one that check if there are nulls or not:nth/nth_backlastcountfor_eachfold/rfoldallanyfind_mapfind/rfindpartitionposition/rpositionthis implement all functions in
Iterator/DoubleEndedIteratorthat the default implementation is either using some base function that we can't implement in stable (e.g.try_fold) or the implementation is naive (e.g. callingnext()a lot of times)Are these changes tested?
Extracted the tests:
ArrayIterdoes not report size hint correctly after advancing from the iterator back #8728Are there any user-facing changes?
Nope
Benchmarks are in:
ArrayIterbenchmarks #8774