PyTorch Docker Application

This document shows how to run docker pytorch application.

How To Launch PyTorch Docker Application

To start creating a workload, sign into the AAC web platform.

Select Application

Click on Applications. Select PyTorch.

In the Select An Application pop-up, select the desired PyTorch version with container type as docker.

Note: In this case, we have selected PyTorch 1-12-1 ROCm5-4 Python 3-8 version and container as docker.

New Workload

Click on New Workload button available on the top right corner.

Select Team

In case the user is assigned to more than one team, a pop window will required to select one of the customer teams to which the user is part of. If there is just one team assigned to the user, it will not be required.

Note: In this case, we have selected Team as AMD Internal.

Click Start new workload button.

Click Next button to continue.

Select Resources

In Select Resources page, specify the number of GPU's (e.g., 1 GPU) and max allowed runtime required for the workload. Click Next button.

Note: The maximum number of GPUs should be 8.

The time for which workload is allowed to run should be specified in the Maximum allowed runtime field. By default 1 hour will be selected.

If maximum allowed run time is 1 hour, it implies, workload will run for 1 hour and then it will be automatically stopped after 1 hour as it will not be allowed to exceed Maximum allowed runtime.

Based on the time required for workload, user should change the Maximum allowed runtime.

Once the workload is launched, user cannot change the total workload time. It has to be configured in this step.

Select Compute

Select the cluster and desired queue to run the job. In this case 1CN128C8G2H_2IB_MI210_SLES15 is selected. Click on Next.

Review Workload Submission

Review all the configurations selected and click on Run Workload.

Once the workload is submitted, the workload status changes to Running when queue is available. Click on the running workload

Monitor Workload

User can see the system logs in SYSLOG, output in STDOUT and errors in STDERR tabs.

A token will be generated in STDOUT tab in yellow color as shown below. Copy the token.

Once the interactive endpoints are enabled, click Connect to launch ML Studio(Jupyter lab).

Jupyter lab opens. Paste the token in 'Password or token' field. Click login.

User can see the below screen. This Jupyter lab can be used for python based development work.

Once the work is done with Jupyter lab, Close it

Click Finish Workload button.

Logs can be downloaded from STDOUT tab by clicking Download Logs once workload is finished.