How to run your own docker interactive workload
This document shows how to run your own docker workload in interactive mode. It will provide a SSH connection to the workload.
To start creating a workload, sign into the AAC web platform.
Permissions
In order to be able to run own docker, you need to request Developer or Admin permissions. Contact support to request for the right permissions.
New Workload
- Click on the Workloads option in the top bar menu.
- Once the workload view is open, click the New Workload button located at the top right.
Select Team
In case the user is assigned to more than one team, a pop window will required to select one of the customer teams to which the user is part of. If there is just one team assigned to the user, it will not be required.
Note: In this case, we have selected Team as AMD Internal.
click the Next button located at the top right.
Configure your own container
The following view show two tabs:
- Docker container: Allows you to configure your own container on the fly.
- Application list: Allows you to select a preconfigured application.
The Docker container view configures the container url, and credentials for private images.
- General information:
- SSH enabled: It determines if workload is either ssh interactive or batch.
- Container url: Docker url.
- Repository Authorization:
- User: User to log in docker repository server.
- Password: Passoword to log in the docker repository server
- Server: Docker repository server. By default docker hub is https://index.docker.io/v1/
In this document we show how to run the interactive mode, so the SSH service is activated and there will be a ssh endpoint to connect.
Fill the container url, the credentials fields are just required when image is private.
Select Input Files
Upload any input file(s) which application will need to run. Click Upload files, and then drag the files into AAC, or click Browse files to open the open file dialog window. Here uploading files is not required, hence click NEXT to proceed.
Request Resources
In the following image, we are able con request the resources we need for our workload.
- Number of GPUs: Number of GPU devices required in the compute resources. It is 1 GPU by default
- Maximum allowed runtime: Request the time the workload will be running until it gets cancelled by the system. It is 1 hour by default. It filters the compute resources with equal or higher maximum runtime.
- Oversubscribe: Select compute resources that allow to share resources in the same GPU. Disabled by default.
- Schedule: Configure workload execution for future. It is configured by using UTC time.
- Telemetry: This enables workload performance track. Showing GPU, CPU, memory, network, IO and other metrics. Telemetry is disabled by default beacuse it could minimal affect to compute performance.
Once workload is launched, user cannot change the total workload time. It has to be configured in the current step.
Click on Next button available on the top right corner.
Select Compute Resources
Select the cluster and queue that are assigned to the team and are available to run the job.
In this case, an resource queue with MI250 is available, so we select it by click over it.
Click on Next button available on the top right corner.
Review Workload Submission
Review Workload Submission. Review the information that has entered for this workload. If any change is needed, it can be changed by clicking Change button in the appropriate sections to make revisions.
Click on 'Run Workload.
Monitor Workload
Each workload goes through several different states after it is submitted -
- Created – The workload has been created in the system.
- Sent – The workload has been sent to the queue that you selected in the workload submission process.
- Pending – The workload is in a waiting state in the queue.
- Running – The workload has started running in the selected queue.
- Finished – The interactive workload has been stopped by the user.
- Failed – A problem has occurred which has prevented the workload from completing successfully.
- Cancelled – The workload has been canceled by the system (maximum runtime exceeded).
SSH connection
Once the workload is running, in a few minutes, the interactive endpoints will appear on the right corner. It can take more time in case the image need to be pulled to the node.
Click on Connect button, it will open a small window with the ssh connection details and also the password.
Logs
The user can check the system, stdout and stderr logs by clicking on respective tabs.
- View SysLog - Information about the workload throughout the entire process.
- View Stdout - Standard output that presents the output of a workload and sometimes includes the results of the workload.
- View Stderr - Standard Error that helps you understand why you may have encountered certain issues during the process.