[WIP] update deps #368

svij-sc · 2025-10-28T04:48:56Z

Scope of work done

Where is the documentation for this feature?: N/A

Did you add automated tests or write a test plan?

Updated Changelog.md? NO

Ready for code review?: NO

Co-authored-by: Shubham Vij <reachme@shubhamvij.com>

svij-sc · 2025-10-30T21:21:01Z

/help

github-actions · 2025-10-30T21:21:19Z

GiGL Automation

@ 21:21:19UTC :

🤖 Available PR Commands

You can trigger the following workflows by commenting on this PR:

/help - Checkout code
/unit_test - Run Unit Tests
/integration_test - Run Integration Tests
/e2e_test - Run E2E Tests
/notebook_tests - Run Example Notebooks Tests
/lint_test - Run Linting Tests

💡 Usage: Simply comment on this PR with any of the commands above (e.g., /unit_test)

⏱️ Note: Commands may take some time to complete. Progress updates will be posted as comments.

…names

semgrep-code-snapchat · 2025-11-08T06:42:33Z

Semgrep found 1 ssc-3c51c742-25ca-4da5-8ab1-a5a6225e2d25 finding:

python/gigl/experimental/knowledge_graph_embedding/common/graph_dataset.py
- L126 - Triage

Risk: Affected versions of pyarrow are vulnerable to Deserialization of Untrusted Data. An attacker can achieve arbitrary code execution due to a vulnerability in the package, stemming from improper handling of untrusted data during deserialization. Deserialization of Arrow IPC, Feather or Parquet data is affected.

Fix: Upgrade this library to at least version 14.0.1 at GiGL/uv.lock:3060.

Reference(s): GHSA-5wvp-7f3h-6wmm, CVE-2023-47248

semgrep-code-snapchat · 2025-11-08T06:43:31Z

python/gigl/distributed/utils/networking.py

+    torch.distributed.broadcast_object_list(
+        object_list=ip_list, src=node_rank, device=device
+    )


Semgrep identified an issue in your code:
Functions reliant on pickle can result in arbitrary code execution

To resolve this comment:

✨ Commit Assistant fix suggestion

Suggested change

torch.distributed.broadcast_object_list(

object_list=ip_list, src=node_rank, device=device

)

# NOTE: torch.distributed.broadcast_object_list uses Python's pickle under the hood, which can be unsafe if any node is untrusted.

# Here, IP addresses are plain strings, which are safe as long as all nodes are trusted and fully controlled, with no user input.

# If this environment is not trusted, consider using tensor-based communication instead.

torch.distributed.broadcast_object_list(

object_list=ip_list, src=node_rank, device=device

)

# If you need stricter security and cannot trust all parties, see the following manual tensor-safe approach:

# import torch

# MAX_IP_LEN = 64 # Maximum expected length of IP string

# if rank == node_rank:

# ip_str_bytes = ip_list[0].encode('utf-8')

# ip_tensor = torch.zeros(MAX_IP_LEN, dtype=torch.uint8, device=device)

# ip_tensor[:len(ip_str_bytes)] = torch.tensor(list(ip_str_bytes), dtype=torch.uint8, device=device)

# else:

# ip_tensor = torch.zeros(MAX_IP_LEN, dtype=torch.uint8, device=device)

# torch.distributed.broadcast(ip_tensor, src=node_rank)

# node_ip = bytes(ip_tensor.cpu().numpy()).rstrip(b'\x00').decode('utf-8')

# logger.info(f"Rank {rank} received master internal IP: {node_ip}")

# assert node_ip, "Could not retrieve master node's internal IP"

# The rest of the code can remain unchanged for primitive string communication in trusted setups.

View step-by-step instructions

Avoid using torch.distributed.broadcast_object_list with arbitrary objects, since this uses Python's pickle under the hood and can be unsafe if any sender is malicious.

Change ip_list to contain only safe and primitive data types. In this case, IP addresses are already plain strings, which are safe.

If you control every node in your distributed setup and trust all sources, document in your code that use of these broadcast functions requires a trusted environment and does not accept user-supplied objects.

Alternatively, if any part of your distributed system could be compromised or you want maximum safety, replace broadcast_object_list with tensor-based communication, converting IP strings to byte tensors using ip_tensor = torch.tensor(bytearray(ip_str, "utf-8")), and reconstruct the string on the receiving end with ip_str = bytes(ip_tensor.tolist()).decode("utf-8").

Review all other uses of broadcast_object_list, all_gather_object, gather_object, and scatter_object_list to ensure they only serialize primitive types like integers or validated ASCII strings.

IP addresses as plain strings are safe as long as you trust all nodes in your torch.distributed setup; pickle is only a risk if objects originate from or can be influenced by an attacker.

💬 Ignore this finding

Reply with Semgrep commands to ignore this finding.

/fp <comment> for false positive

/ar <comment> for acceptable risk

/other <comment> for all other reasons

Alternatively, triage in Semgrep AppSec Platform to ignore the finding created by pickles-in-pytorch-distributed.

_{You can view more details about this finding in the Semgrep AppSec Platform.}

…names

svij-sc and others added 5 commits October 27, 2025 21:48

update deps

9927cf0

Co-authored-by: Shubham Vij <reachme@shubhamvij.com>

update

6b0baf0

backup

e6440e9

update

bd676f7

remove raw python / conda from actions and docker

7f487e9

svij-sc and others added 23 commits October 30, 2025 22:39

fix

aec38ae

test ci

e5a8738

test

d10eefd

lock .python-version file

600d4b3

build base images test

0852497

add curl as dep to docker images

05e7969

copy pyproject.toml to docker

dffa77e

fix

9394b47

update cuda image

ff95106

docs for install_glt

75c92fe

push image names if run as test

b250199

[AUTOMATED] Update dep.vars, and other relevant files with new image …

5fcf628

…names

pre-commit from uv

13a3df6

set user to root for builder image

573a209

build docker image

79c7b32

[AUTOMATED] Update dep.vars, and other relevant files with new image …

102e36b

…names

disable docker image push

1c1a957

test

8695603

mk dir

e83e724

update docker UV_PROJECT_ENVIRONMENT

d3f4cfa

build docker image

97d4d62

[AUTOMATED] Update dep.vars, and other relevant files with new image …

5b33b7e

…names

remove requirements files and lock uv deps

d18a429

svij-sc and others added 15 commits November 2, 2025 06:57

try 2

90de670

install pip after all vars setup

dcfb742

tools to tool

59ce50c

[AUTOMATED] Update dep.vars, and other relevant files with new image …

551e40f

…names

test

a1b2d0f

on push test

4e94b50

lock

3e8d8d6

deploy image

84d497f

[AUTOMATED] Update dep.vars, and other relevant files with new image …

87d8b93

…names

test

f72b6d7

try new dataflow

5924903

intro docker entrypoint script

af75360

remove containers from dockerignore

d35ec3a

update beam image

ce13d2c

fix

512be71

semgrep-code-snapchat bot reviewed Nov 8, 2025

View reviewed changes

github-actions bot and others added 13 commits November 8, 2025 06:55

[AUTOMATED] Update dep.vars, and other relevant files with new image …

fbd93e2

…names

try new docker images

5eaddb5

boom?

07c0e1e

skip glt install

21e2b05

[AUTOMATED] Update dep.vars, and other relevant files with new image …

b04c726

…names

try 2

f402cce

[AUTOMATED] Update dep.vars, and other relevant files with new image …

25d0022

…names

test

e3cb635

rebuild cuda

ea9d30c

try 2

4698dab

[AUTOMATED] Update dep.vars, and other relevant files with new image …

be03566

…names

test

5df22cf

update GPUs

8ceac1a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] update deps #368

[WIP] update deps #368

Uh oh!

svij-sc commented Oct 28, 2025 •

edited

Loading

Uh oh!

svij-sc commented Oct 30, 2025

Uh oh!

github-actions bot commented Oct 30, 2025

Uh oh!

semgrep-code-snapchat bot commented Nov 8, 2025

Uh oh!

semgrep-code-snapchat bot Nov 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-    torch.distributed.broadcast_object_list(
-        object_list=ip_list, src=node_rank, device=device
-    )
+    # NOTE: torch.distributed.broadcast_object_list uses Python's pickle under the hood, which can be unsafe if any node is untrusted.
+    # Here, IP addresses are plain strings, which are safe as long as all nodes are trusted and fully controlled, with no user input.
+    # If this environment is not trusted, consider using tensor-based communication instead.
+    torch.distributed.broadcast_object_list(
+        object_list=ip_list, src=node_rank, device=device
+    )
+    # If you need stricter security and cannot trust all parties, see the following manual tensor-safe approach:
+    # import torch
+    # MAX_IP_LEN = 64  # Maximum expected length of IP string
+    # if rank == node_rank:
+    #     ip_str_bytes = ip_list[0].encode('utf-8')
+    #     ip_tensor = torch.zeros(MAX_IP_LEN, dtype=torch.uint8, device=device)
+    #     ip_tensor[:len(ip_str_bytes)] = torch.tensor(list(ip_str_bytes), dtype=torch.uint8, device=device)
+    # else:
+    #     ip_tensor = torch.zeros(MAX_IP_LEN, dtype=torch.uint8, device=device)
+    # torch.distributed.broadcast(ip_tensor, src=node_rank)
+    # node_ip = bytes(ip_tensor.cpu().numpy()).rstrip(b'\x00').decode('utf-8')
+    # logger.info(f"Rank {rank} received master internal IP: {node_ip}")
+    # assert node_ip, "Could not retrieve master node's internal IP"
+    # The rest of the code can remain unchanged for primitive string communication in trusted setups.

[WIP] update deps #368

Are you sure you want to change the base?

[WIP] update deps #368

Uh oh!

Conversation

svij-sc commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

svij-sc commented Oct 30, 2025

Uh oh!

github-actions bot commented Oct 30, 2025

GiGL Automation

🤖 Available PR Commands

Uh oh!

semgrep-code-snapchat bot commented Nov 8, 2025

Uh oh!

semgrep-code-snapchat bot Nov 8, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

svij-sc commented Oct 28, 2025 •

edited

Loading