Alamb/test boolean kernels #8793

alamb · 2025-11-05T22:14:18Z

Which issue does this PR close?

We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax.

Closes #NNN.

Rationale for this change

Similarly to TESTING: Change `BooleanBuffer::append_packed_range to use bitwise_binary_op #8744

What changes are included in this PR?

Apply feat: add apply_unary_op and apply_binary_op bitwise operations #8619 to the existing boolean kernels

benchmark results for boolean_kernels

Are these changes tested?

will benchmark
If tests are not included in your PR, please explain why (for example, are they covered by existing tests)?

Are there any user-facing changes?

If there are user-facing changes then we may require documentation to be updated before approving the PR.

If there are any breaking changes to public APIs, please call them out.

…table. but I don't want to pass slice of bytes as then I don't know the source and users must make sure that they hold the same promises as Buffer/MutableBuffer

…olean-buffer-builder

…m:rluvaton/arrow-rs into add-bitwise-ops-to-boolean-buffer-builder

…olean-buffer-builder

alamb · 2025-11-05T22:27:02Z

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/test_boolean_kernels (b0cf38b) to eaca232 diff
BENCH_NAME=boolean_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench boolean_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_test_boolean_kernels
Results will be posted here when complete

alamb · 2025-11-05T22:31:04Z

🤖: Benchmark completed

Details

group         alamb_test_boolean_kernels             main
-----         --------------------------             ----
and           1.00    265.0±0.39ns        ? ?/sec    1.05    278.3±1.44ns        ? ?/sec
and_sliced    1.00    254.0±0.36ns        ? ?/sec    4.84   1228.2±2.47ns        ? ?/sec
not           1.00    189.4±0.31ns        ? ?/sec    1.13    214.3±1.98ns        ? ?/sec
not_sliced    1.00    592.5±0.89ns        ? ?/sec    1.18    697.8±1.35ns        ? ?/sec
or            1.13    282.5±1.07ns        ? ?/sec    1.00    251.1±2.04ns        ? ?/sec
or_sliced     1.00    284.1±0.75ns        ? ?/sec    3.85   1093.3±2.06ns        ? ?/sec

alamb · 2025-11-05T22:31:28Z

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/test_boolean_kernels (b0cf38b) to eaca232 diff
BENCH_NAME=boolean_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench boolean_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_test_boolean_kernels
Results will be posted here when complete

alamb · 2025-11-05T22:35:24Z

🤖: Benchmark completed

Details

group         alamb_test_boolean_kernels             main
-----         --------------------------             ----
and           1.00    265.1±0.42ns        ? ?/sec    1.03    273.9±1.28ns        ? ?/sec
and_sliced    1.00    254.3±0.92ns        ? ?/sec    4.83   1227.1±3.82ns        ? ?/sec
not           1.00    191.2±0.29ns        ? ?/sec    1.13    216.3±0.32ns        ? ?/sec
not_sliced    1.00    593.1±1.10ns        ? ?/sec    1.18    697.8±0.98ns        ? ?/sec
or            1.12    282.2±0.59ns        ? ?/sec    1.00    251.6±0.56ns        ? ?/sec
or_sliced     1.00    284.2±0.68ns        ? ?/sec    3.84   1092.4±1.87ns        ? ?/sec

alamb · 2025-11-06T10:10:09Z

That is interesting that OR is slower

Dandandan · 2025-11-06T21:34:17Z

arrow-buffer/src/buffer/ops.rs

-    let rem = &rem.to_le_bytes()[0..remainder_bytes];
-    buffer.extend_from_slice(rem);
+    let len_bytes = ceil(starting_bit_in_byte + len_in_bits, 8);
+    let mut result = left[start_byte..len_bytes].to_vec();


This does an extra copy which wasn't there before (using from_trusted_len_iter)

🤔 you are right. We do need a new allocation to write into but don't need to copy the values

It is fascinating however, that this code is often still faster than the previous one (maybe due to fewer branches)

I'll see if I can perhaps optimize the case when the offsets are zero which I think is a common case

optimization works well: #8807

Dandandan · 2025-11-06T21:34:48Z

arrow-buffer/src/buffer/ops.rs

+    let len_in_bytes = ceil(len_in_bits, 8);
+    let mut result;
+    if offset_in_bits == 0 {
+        result = left.as_slice()[0..len_in_bytes].to_vec();


This shouldn't be needed as well

alamb · 2025-11-07T14:43:46Z

Thank you @Dandandan for your comments. After thinking about this some more I think there is an important difference between modify in place and create new APIs. I filed a larger potential code reorg (we could do it and keep backwards compatibility though):

Consolidate bitwise operation implementations #8806

However, given the fact that this PR shows we can get the "create new API" to go faster, I am trying some other tricks

WIP: special case bitwise ops when buffers are u64 aligned #8807

alamb · 2025-11-07T15:54:39Z

#8807 is looking much more promising

rluvaton and others added 30 commits October 12, 2025 16:48

add bitwise ops

c7d9267

add bitwise ops

d14e5b7

cleanup

739fe0a

pub(crate) as I don't like that we have both mutable and only left mu…

0e15b32

…table. but I don't want to pass slice of bytes as then I don't know the source and users must make sure that they hold the same promises as Buffer/MutableBuffer

start adding tests

c442299

add tests

2f28dc3

add trait for left

c4676a6

format

da03628

revert changes

652a256

fix validation

0c29f0e

remove many unsafe and cleanup

bcd4863

format

6b7bfe9

add reproduction test

aec92d6

extract, cleanup and add comments

db3e853

add comments

0a64bcb

Merge remote-tracking branch 'apache/main' into add-bitwise-ops-to-bo…

ca621f8

…olean-buffer-builder

Update arrow-buffer/src/buffer/mutable_ops.rs

d63d72c

Merge branch 'add-bitwise-ops-to-boolean-buffer-builder' of github.co…

464e56c

…m:rluvaton/arrow-rs into add-bitwise-ops-to-boolean-buffer-builder

Revert changes to boolean

07679d7

Restore enough for the tests

bfdf381

Improve docs

246d4e2

Move into mutable module

b9acb34

Add example/doc tests

d590ee1

Add tests for out of bounds

ccf266f

Add tests for unary ops

005c444

Add panic doc

3a8e760

fmt

cf52bdf

Move buffer modification to bit_utils

6dbed0b

Move tests and remove changes to MutableBufer

9ca7e45

Merge remote-tracking branch 'apache/main' into add-bitwise-ops-to-bo…

5cb50d5

…olean-buffer-builder

alamb added 2 commits November 5, 2025 12:41

Update docs

379d1ec

fix docs

1fb4981

alamb marked this pull request as draft November 5, 2025 22:14

github-actions bot added the arrow Changes to the arrow crate label Nov 5, 2025

Use new bitwise_binary_op in boolean kernels

b0cf38b

alamb force-pushed the alamb/test_boolean_kernels branch from f71a57e to b0cf38b Compare November 5, 2025 22:26

hack

5e4e242

Dandandan reviewed Nov 6, 2025

View reviewed changes

This was referenced Nov 7, 2025

Consolidate bitwise operation implementations #8806

Open

WIP: special case bitwise ops when buffers are u64 aligned #8807

Draft

alamb closed this Nov 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Alamb/test boolean kernels #8793

Alamb/test boolean kernels #8793

Uh oh!

alamb commented Nov 5, 2025

Uh oh!

alamb commented Nov 5, 2025

Uh oh!

alamb commented Nov 5, 2025

Uh oh!

alamb commented Nov 5, 2025

Uh oh!

alamb commented Nov 5, 2025

Uh oh!

alamb commented Nov 6, 2025

Uh oh!

Dandandan Nov 6, 2025

Uh oh!

alamb Nov 7, 2025

Uh oh!

alamb Nov 7, 2025

Uh oh!

Dandandan Nov 6, 2025

Uh oh!

alamb commented Nov 7, 2025

Uh oh!

alamb commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Alamb/test boolean kernels #8793

Alamb/test boolean kernels #8793

Uh oh!

Conversation

alamb commented Nov 5, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

alamb commented Nov 5, 2025

Uh oh!

alamb commented Nov 5, 2025

Uh oh!

alamb commented Nov 5, 2025

Uh oh!

alamb commented Nov 5, 2025

Uh oh!

alamb commented Nov 6, 2025

Uh oh!

Dandandan Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

alamb Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

alamb Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Dandandan Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

alamb commented Nov 7, 2025

Uh oh!

alamb commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants