Skip to content

Conversation

@gchalump
Copy link
Contributor

@gchalump gchalump commented Nov 6, 2025

Summary:
Add get_unique_indices on CPU
Add test to compare get_unique_indices from CPU with GPU

Differential Revision: D85736286

@netlify
Copy link

netlify bot commented Nov 6, 2025

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
🔨 Latest commit 750bbec
🔍 Latest deploy log https://app.netlify.com/projects/pytorch-fbgemm-docs/deploys/6913bfef5de30900085ec1ef
😎 Deploy Preview https://deploy-preview-5096--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@meta-codesync
Copy link
Contributor

meta-codesync bot commented Nov 6, 2025

@gchalump has exported this pull request. If you are a Meta employee, you can view the originating Diff in D85736286.

@meta-cla meta-cla bot added the cla signed label Nov 6, 2025
gchalump added a commit to gchalump/FBGEMM that referenced this pull request Nov 7, 2025
Summary:

X-link: facebookresearch/FBGEMM#2103

Add `get_unique_indices` on CPU
Add test to compare `get_unique_indices` from CPU with GPU

Differential Revision: D85736286
gchalump added a commit to gchalump/FBGEMM that referenced this pull request Nov 7, 2025
Summary:

X-link: facebookresearch/FBGEMM#2103

Add `get_unique_indices` on CPU
Add test to compare `get_unique_indices` from CPU with GPU

Differential Revision: D85736286
gchalump added a commit to gchalump/FBGEMM that referenced this pull request Nov 7, 2025
Summary:

X-link: facebookresearch/FBGEMM#2103

Add `get_unique_indices` on CPU
Add test to compare `get_unique_indices` from CPU with GPU

Differential Revision: D85736286
@gchalump gchalump force-pushed the export-D85736286 branch 2 times, most recently from 6452f4a to e457285 Compare November 10, 2025 16:44
gchalump added a commit to gchalump/FBGEMM that referenced this pull request Nov 10, 2025
Summary:

X-link: facebookresearch/FBGEMM#2103

Implements `get_unique_indices_cpu_impl()` to extract unique indices from linear index tensors on CPU, with comprehensive documentation and test coverage for both int32 and int64 dtypes.

Function Description
--------------------

**`get_unique_indices_cpu_impl`** processes a 1D tensor of linear indices and returns unique values with optional metadata (counts and inverse mapping for reordering).

### Example

```
Input:  linear_indices = [20, 0, 10, 10, 0] 

Output:  
unique_indices = [0, 10, 20, x, x]  (sorted, padded)
unique_indices_length = [3]
unique_indices_count = [2, 2, 1, x, x]  (occurrence counts) 
linear_index_positions_sorted = [1, 4, 2, 3, 0]     (positions that sort input: linear_indices[[1,4,2,3,0]] = [0,0,10,10,20])
```

### Returns

1.  **unique_indices**: Sorted unique values padded to input size (first `num_unique` elements valid)
2.  **unique_indices_length**: Scalar tensor with count of unique values
3.  **unique_indices_count** (optional): Occurrence count for each unique value
4.  **linear_index_positions_sorted** (optional): Original positions that reorder input to sorted order (int32)

### Implementation Details

*   Uses `at::unique_dim()` for core uniqueness computation with stable sorting
*   Preserves input dtype for unique values
*   Converts counts and positions to int32 for consistency with CUDA implementation
*   Supports both `torch.int` (int32) and `torch.long` (int64) input dtypes

### Test Coverage

Added dtype parameterization to `test_get_unique_indices_cpu` to validate both int32 and int64, ensuring CPU implementation supports all dtypes that CUDA implementation support.

Differential Revision: D85736286
gchalump added a commit to gchalump/FBGEMM that referenced this pull request Nov 10, 2025
Summary:

X-link: facebookresearch/FBGEMM#2103

Implements `get_unique_indices_cpu_impl()` to extract unique indices from linear index tensors on CPU, with comprehensive documentation and test coverage for both int32 and int64 dtypes.

Function Description
--------------------

**`get_unique_indices_cpu_impl`** processes a 1D tensor of linear indices and returns unique values with optional metadata (counts and inverse mapping for reordering).

### Example

```
Input:  linear_indices = [20, 0, 10, 10, 0] 

Output:  
unique_indices = [0, 10, 20, x, x]  (sorted, padded)
unique_indices_length = [3]
unique_indices_count = [2, 2, 1, x, x]  (occurrence counts) 
linear_index_positions_sorted = [1, 4, 2, 3, 0]     (positions that sort input: linear_indices[[1,4,2,3,0]] = [0,0,10,10,20])
```

### Returns

1.  **unique_indices**: Sorted unique values padded to input size (first `num_unique` elements valid)
2.  **unique_indices_length**: Scalar tensor with count of unique values
3.  **unique_indices_count** (optional): Occurrence count for each unique value
4.  **linear_index_positions_sorted** (optional): Original positions that reorder input to sorted order (int32)

### Implementation Details

*   Uses `at::unique_dim()` for core uniqueness computation with stable sorting
*   Preserves input dtype for unique values
*   Converts counts and positions to int32 for consistency with CUDA implementation
*   Supports both `torch.int` (int32) and `torch.long` (int64) input dtypes

### Test Coverage

Added dtype parameterization to `test_get_unique_indices_cpu` to validate both int32 and int64, ensuring CPU implementation supports all dtypes that CUDA implementation support.

Differential Revision: D85736286
@gchalump gchalump force-pushed the export-D85736286 branch 2 times, most recently from d92220d to 1ac09f4 Compare November 11, 2025 18:47
gchalump added a commit to gchalump/FBGEMM that referenced this pull request Nov 11, 2025
Summary:

X-link: facebookresearch/FBGEMM#2103

Implements `get_unique_indices_cpu_impl()` to extract unique indices from linear index tensors on CPU, with comprehensive documentation and test coverage for both int32 and int64 dtypes.

Function Description
--------------------

**`get_unique_indices_cpu_impl`** processes a 1D tensor of linear indices and returns unique values with optional metadata (counts and inverse mapping for reordering).

### Example

```
Input:  linear_indices = [20, 0, 10, 10, 0]

Output:
unique_indices = [0, 10, 20, x, x]  (sorted, padded)
unique_indices_length = [3]
unique_indices_count = [2, 2, 1, x, x]  (occurrence counts)
linear_index_positions_sorted = [1, 4, 2, 3, 0]     (positions that sort input: linear_indices[[1,4,2,3,0]] = [0,0,10,10,20])
```

### Returns

1.  **unique_indices**: Sorted unique values padded to input size (first `num_unique` elements valid)
2.  **unique_indices_length**: Scalar tensor with count of unique values
3.  **unique_indices_count** (optional): Occurrence count for each unique value
4.  **linear_index_positions_sorted** (optional): Original positions that reorder input to sorted order (int32)

### Implementation Details

*   Uses `at::unique_dim()` for core uniqueness computation with stable sorting
*   Preserves input dtype for unique values
*   Converts counts and positions to int32 for consistency with CUDA implementation
*   Supports both `torch.int` (int32) and `torch.long` (int64) input dtypes

### Test Coverage

Added dtype parameterization to `test_get_unique_indices_cpu` to validate both int32 and int64, ensuring CPU implementation supports all dtypes that CUDA implementation support.

Differential Revision: D85736286
gchalump added a commit to gchalump/FBGEMM that referenced this pull request Nov 11, 2025
Summary:

X-link: facebookresearch/FBGEMM#2103

Implements `get_unique_indices_cpu_impl()` to extract unique indices from linear index tensors on CPU, with comprehensive documentation and test coverage for both int32 and int64 dtypes.

Function Description
--------------------

**`get_unique_indices_cpu_impl`** processes a 1D tensor of linear indices and returns unique values with optional metadata (counts and inverse mapping for reordering).

### Example

```
Input:  linear_indices = [20, 0, 10, 10, 0]

Output:
unique_indices = [0, 10, 20, x, x]  (sorted, padded)
unique_indices_length = [3]
unique_indices_count = [2, 2, 1, x, x]  (occurrence counts)
linear_index_positions_sorted = [1, 4, 2, 3, 0]     (positions that sort input: linear_indices[[1,4,2,3,0]] = [0,0,10,10,20])
```

### Returns

1.  **unique_indices**: Sorted unique values padded to input size (first `num_unique` elements valid)
2.  **unique_indices_length**: Scalar tensor with count of unique values
3.  **unique_indices_count** (optional): Occurrence count for each unique value
4.  **linear_index_positions_sorted** (optional): Original positions that reorder input to sorted order (int32)

### Implementation Details

*   Uses `at::unique_dim()` for core uniqueness computation with stable sorting
*   Preserves input dtype for unique values
*   Converts counts and positions to int32 for consistency with CUDA implementation
*   Supports both `torch.int` (int32) and `torch.long` (int64) input dtypes

### Test Coverage

Added dtype parameterization to `test_get_unique_indices_cpu` to validate both int32 and int64, ensuring CPU implementation supports all dtypes that CUDA implementation support.

Differential Revision: D85736286
@gchalump gchalump force-pushed the export-D85736286 branch 2 times, most recently from 750bbec to 1621b41 Compare November 12, 2025 01:33
gchalump added a commit to gchalump/FBGEMM that referenced this pull request Nov 12, 2025
Summary:

X-link: facebookresearch/FBGEMM#2103

Implements `get_unique_indices_cpu_impl()` to extract unique indices from linear index tensors on CPU, with comprehensive documentation and test coverage for both int32 and int64 dtypes.

Function Description
--------------------

**`get_unique_indices_cpu_impl`** processes a 1D tensor of linear indices and returns unique values with optional metadata (counts and inverse mapping for reordering).

### Example

```
Input:  linear_indices = [20, 0, 10, 10, 0]

Output:
unique_indices = [0, 10, 20, x, x]  (sorted, padded)
unique_indices_length = [3]
unique_indices_count = [2, 2, 1, x, x]  (occurrence counts)
linear_index_positions_sorted = [1, 4, 2, 3, 0]     (positions that sort input: linear_indices[[1,4,2,3,0]] = [0,0,10,10,20])
```

### Returns

1.  **unique_indices**: Sorted unique values padded to input size (first `num_unique` elements valid)
2.  **unique_indices_length**: Scalar tensor with count of unique values
3.  **unique_indices_count** (optional): Occurrence count for each unique value
4.  **linear_index_positions_sorted** (optional): Original positions that reorder input to sorted order (int32)

### Implementation Details

*   Uses `at::unique_dim()` for core uniqueness computation with stable sorting
*   Preserves input dtype for unique values
*   Converts counts and positions to int32 for consistency with CUDA implementation
*   Supports both `torch.int` (int32) and `torch.long` (int64) input dtypes

### Test Coverage

Added dtype parameterization to `test_get_unique_indices_cpu` to validate both int32 and int64, ensuring CPU implementation supports all dtypes that CUDA implementation support.

Differential Revision: D85736286
gchalump added a commit to gchalump/FBGEMM that referenced this pull request Nov 12, 2025
Summary:

X-link: facebookresearch/FBGEMM#2103

Implements `get_unique_indices_cpu_impl()` to extract unique indices from linear index tensors on CPU, with comprehensive documentation and test coverage for both int32 and int64 dtypes.

Function Description
--------------------

**`get_unique_indices_cpu_impl`** processes a 1D tensor of linear indices and returns unique values with optional metadata (counts and inverse mapping for reordering).

### Example

```
Input:  linear_indices = [20, 0, 10, 10, 0]

Output:
unique_indices = [0, 10, 20, x, x]  (sorted, padded)
unique_indices_length = [3]
unique_indices_count = [2, 2, 1, x, x]  (occurrence counts)
linear_index_positions_sorted = [1, 4, 2, 3, 0]     (positions that sort input: linear_indices[[1,4,2,3,0]] = [0,0,10,10,20])
```

### Returns

1.  **unique_indices**: Sorted unique values padded to input size (first `num_unique` elements valid)
2.  **unique_indices_length**: Scalar tensor with count of unique values
3.  **unique_indices_count** (optional): Occurrence count for each unique value
4.  **linear_index_positions_sorted** (optional): Original positions that reorder input to sorted order (int32)

### Implementation Details

*   Uses `at::unique_dim()` for core uniqueness computation with stable sorting
*   Preserves input dtype for unique values
*   Converts counts and positions to int32 for consistency with CUDA implementation
*   Supports both `torch.int` (int32) and `torch.long` (int64) input dtypes

### Test Coverage

Added dtype parameterization to `test_get_unique_indices_cpu` to validate both int32 and int64, ensuring CPU implementation supports all dtypes that CUDA implementation support.

Differential Revision: D85736286
Summary:

X-link: facebookresearch/FBGEMM#2103

Implements `get_unique_indices_cpu_impl()` to extract unique indices from linear index tensors on CPU, with comprehensive documentation and test coverage for both int32 and int64 dtypes.

Function Description
--------------------

**`get_unique_indices_cpu_impl`** processes a 1D tensor of linear indices and returns unique values with optional metadata (counts and inverse mapping for reordering).

### Example

```
Input:  linear_indices = [20, 0, 10, 10, 0]

Output:
unique_indices = [0, 10, 20, x, x]  (sorted, padded)
unique_indices_length = [3]
unique_indices_count = [2, 2, 1, x, x]  (occurrence counts)
linear_index_positions_sorted = [1, 4, 2, 3, 0]     (positions that sort input: linear_indices[[1,4,2,3,0]] = [0,0,10,10,20])
```

### Returns

1.  **unique_indices**: Sorted unique values padded to input size (first `num_unique` elements valid)
2.  **unique_indices_length**: Scalar tensor with count of unique values
3.  **unique_indices_count** (optional): Occurrence count for each unique value
4.  **linear_index_positions_sorted** (optional): Original positions that reorder input to sorted order (int32)

### Implementation Details

*   Uses `at::unique_dim()` for core uniqueness computation with stable sorting
*   Preserves input dtype for unique values
*   Converts counts and positions to int32 for consistency with CUDA implementation
*   Supports both `torch.int` (int32) and `torch.long` (int64) input dtypes

### Test Coverage

Added dtype parameterization to `test_get_unique_indices_cpu` to validate both int32 and int64, ensuring CPU implementation supports all dtypes that CUDA implementation support.

Differential Revision: D85736286
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant