How to install smdistributed? #2452
Replies: 4 comments 1 reply
-
| Hi, the easiest way is to either use or start from one of the built-in TensorFlow/PyTorch containers. You can also build your own, starting from the SageMaker Training Toolkit at https://github.com/aws/sagemaker-training-toolkit. More details at https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-use-api.html | 
Beta Was this translation helpful? Give feedback.
-
| Note: SageMaker distributed is not currently supported in the Chinese regions. I'm adding this note because I see that you've pointed to a doc in the China region. | 
Beta Was this translation helpful? Give feedback.
-
| 
 I see, I'm actually not sure why I ended up at a  | 
Beta Was this translation helpful? Give feedback.
-
| You can install it with  pip install https://smdataparallel.s3.amazonaws.com/binary/pytorch/2.4.1/cu121/2024-10-09/smdistributed_dataparallel-2.5.0-cp311-cp311-linux_x86_64.whlThe available binaries (i.e. links to the .whl) can be found in https://docs.aws.amazon.com/sagemaker/latest/dg/distributed-data-parallel-support.html#distributed-data-parallel-supported-frameworks | 
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
What did you find confusing? Please describe.
I installed sagemaker with
pip install sagemaker --update, and am attempting to use distributed model parallel with pytorch. However, I'm unable to importsmdistributed.The docs https://sagemaker.readthedocs.io/en/stable/api/training/smp_versions/v1.2.0/smd_model_parallel_pytorch.html don't have installation instructions for smdistributed. I was wondering how do I get
smdistributedinstalled? Thank you!I am also looking at https://docs.amazonaws.cn/en_us/sagemaker/latest/dg/model-parallel-customize-training-script-pt.html which directs me to https://sagemaker.readthedocs.io/en/stable/api/training/smp_versions/v1.2.0/smd_model_parallel_common_api.html#smp.init to initialize the sagemaker distributed environment. But again I'm not sure how to get the
smdistributedlibrary.https://github.com/aws/amazon-sagemaker-examples has some
smdistributedexamples but doesn't provide any clear installation instructions.environment.ymlin that repo seems to indicate all that's needed issagemakerwhich I have installed.Describe how documentation can be improved
Could not find clear installation instructions for smdistributed, would it be possible to add these?
Additional context
Add any other context or screenshots about the documentation request here.
Beta Was this translation helpful? Give feedback.
All reactions