Skip to content

Conversation

matthewdouglas
Copy link
Member

Fixes #1782

This PR resolves an issue with int32 overflows in indexing calculations used by the blockwise quantization and dequantization kernels. It also adds tests to verify that quantization and dequantization works on tensors with the maximum supported size of 2**31 - 1 elements. Prior to this fix, the quantization kernel would have issues with tensors above 2**30 elements.

@matthewdouglas matthewdouglas added this to the v0.48.2 milestone Oct 21, 2025
@matthewdouglas matthewdouglas added the CUDA Issues and PRs related to the CUDA backend, excluding installation/support help. label Oct 21, 2025
@github-actions
Copy link

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@matthewdouglas matthewdouglas merged commit 34400d2 into main Oct 22, 2025
252 of 264 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CUDA Issues and PRs related to the CUDA backend, excluding installation/support help.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Illegal memory access with quantize_4bit

1 participant