Skip to content

Conversation

@mkolodner-sc
Copy link
Collaborator

@mkolodner-sc mkolodner-sc commented Sep 8, 2025

Scope of work done

Removes the assumption that the supervision edge type is always directed outward from the V2 GiGL codebase. We are removing this assumption since it can introduce confusion and greater difficulty in maintainability to be internally flipped edge types due to the assumption that our supervision edge is always outward direction. We are reducing opinionation here by removing this requirement.

Specifically, it does this by:

  • Removing the flipping step when we convert labels to edge types
  • Remove the flipping step when we are splitting the data
  • Remove the flipping step when we are initializing the neighborloader with some supervision edge type
  • Make it so that if we partition_labels, we don't automatically assume src as the root node type and instead base the root node type on the edge direction
  • Adds corresponding tests

Following this change, we make no underlying assumption about the supervision edge type in the graph and assume that it also complies with the sampling_edge_direction provided. This means for a supervision edge type A -> B which is directed outward, we would have A by the anchor node type and B be the supervision node type. If the graph was inward, we'd have B be the anchor node type and A be the supervision node type.

This provided supervision edge type should adhere to what labels were specified in data preprocessor. A follow-up item from this PR is to add a check that our data preprocessor config labels adhere to the task metadata.

Where is the documentation for this feature?: N/A

Did you add automated tests or write a test plan?

Updated Changelog.md? NO

Ready for code review?: NO

Copy link
Collaborator

@kmontemayor2-sc kmontemayor2-sc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Matt, thanks for the work.

I think for future readers of this PR (and also me rn...) it may be useful to add more description to the PR about why we do these things/etc. Where we were, why we are changing, and what the rules are now, etc.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, the DBLP dataset contains both (paper, to, author), and (author, to, paper) right 1?

I have some concerns about some case with only (paper, to, author) and a supervision edge type (author, to, paper) having previously worked, since we reversed the edge type, but now it would fail.

Does this seem correct to you? And if so could we add some test for a supervision edge type whose either doesn't have a corresponding message passing edge type, or whose message passing edge type isn't reciprocal?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can aim to increase the test coverage here, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants