PyTorch Docker
This document shows how to run docker pytorch application.
How To Launch PyTorch Docker Application
To start creating a workload, sign into the AAC.
Select Application
Click on Applications in the navigation menu. Then, select the PyTorch application with preferred version from the list.
Note: In this example, Pytorch_2_4_0_Rocm6_3_0 (Jupyterlab_4_2_5) is selected.
New Workload
Click on New Workload button available on the top right corner.
Select Team
In case the user is assigned to more than one team, a pop window will required to select one of the customer teams to which the user is part of. If there is just one team assigned to the user, it will not be required.
Note: In this case, we have selected Team as AMD Internal.
Click Launch button.
Select input files
The files required for the workload can be uploaded in this step by clicking on Upload files and click Next button to continue.
Select Resources
In Select Resources page, specify the number of GPU's (e.g., 1 GPU) and max allowed runtime required for the workload. Click Next button.
Note: The maximum number of GPUs should be 8.
The time for which workload is allowed to run should be specified in the Maximum allowed runtime field. By default 1 hour will be selected.
If maximum allowed run time is 1 hour, it implies, workload will run for 1 hour and then it will be automatically stopped after 1 hour as it will not be allowed to exceed Maximum allowed runtime.
Based on the time required for workload, user should change the Maximum allowed runtime.
Once the workload is launched, user cannot change the total workload time. It has to be configured in this step.
Select Compute
Select the cluster and desired queue to run the job. In this case 1CN128C8G2H_2IB_MI210_SLES15 is selected. Click on Next.
Review Workload Submission
Review all the configurations selected and click on Run Workload.
Once the workload is submitted, the workload status changes to Running when queue is available. Click on the running workload
Monitor Workload
User can see the system logs in SYSLOG, output in STDOUT and errors in STDERR tabs.
Once the interactive endpoints are enabled, There are two option available to connect.
- JupyterLab
- SSH
For Jupyter lab select jupyterlab and click Connect to launch ML Studio(Jupyter lab). The password is printed in the STDOUT tab but also in the Interactive endpoints panel in the secret key area.
User can see the below screen. This Jupyter lab can be used for python based development work.
Once the work is done with Jupyter lab, Close it
For SSH access, select ssh and click connect
To access the server CLI, copy the shell command by clicking on copy shell command and copy the password by clicking on copy to clipboard.
Here the username is aac and the command is ssh -o strictHostKeyChecking=no -p 7000 aac@aac1.amd.comm
Here is the AAC in-built service terminal which can be accessed by clicking on Service terminal.
Click Finish Workload button.
Logs can be downloaded from STDOUT tab by clicking Download Logs once workload is finished.