How to Run Workload
This document shows how to run workload.
To start creating a workload, sign into the AAC web platform.
A workload can be launched by following the below steps.
Select Application
The application for which the workload has to be run can be selected in this step. It can be done in two ways as mentioned below.
a. Navigate to Applications page.
b. Select application for which workload has to be run.
Note: In this case, we have selected Application as HPCG.
c. Select desired version if prompted. It can be either docker or singularity.
Note: In this case, we have selected rocHPCG 3.1.0_97 version and container as docker.
d. Click on New workload button
Navigate to Workloads Page. Click New Workload button
Select desired application for which workload has to be run.
Note: In this case, we have selected Application as HPCG.
Select desired version if prompted. It can be either docker or singularity.
Note: In this case, we have selected rocHPCG 3.1.0_97 version and container as docker.
Click Next button.
Select Team
In case the user is assigned to more than one team, a pop window will required to select one of the customer teams to which the user is part of. If there is just one team assigned to the user, it will not be required.
Note: In this case, we have selected Team as AMD Internal.
Click Start new workload button.
Select Input Files
The files required for the workload can be uploaded in this step.
a. Click Upload files.
b. Click Browse Files
c. Maximum of 5 files can be uploaded at one time.
d. Select the required files. Click Next.
Application Configuration
User can either continue with default script or enter custom commands into the following script fields.
Note: Application configuration step will not be available for interactive applications like pyTorch, tensorflow, Jammy etc.
a. Pre-run Script in this step, commands which are required to be executed before creating containers can be added into Pre-run Script field.
b. Run Script in this step, commands which are required to be executed in the container created can be added into Run Script field. Benchmarking script for 8 GPUs will be available by default for the selected application in this field. AMD Infinity hub can be referred for more benchmarking commands
c. Post-run Script in this step, commands which are required to be executed after creating containers can be added into Post-run Script field.Click Next
Select Resources
Required inputs for workload like number of GPUs, maximum allowed run time, etc can be entered in this step.
a. By default Queue oversubscribe will be disabled.
Enabling Queue oversubscribe means allocated resources to the workload will be shared among other workloads.
b. Telemetry enabled: By default telemetry is enabled.
If telemetry is enabled, real time metrics will appear in the workload's detail page and once workload is completed, performance tab will be shown. If disabled, real time metrics or performance tab will not be shown on workload's detail page.
c. Maximum allowed runtime The time for which workload is allowed to run should be specified in the Maximum allowed runtime field. By default 1 hour will be selected.
If maximum allowed run time is 1 hour, it implies, workload will run for 1 hour and then it will be automatically stopped after 1 hour as it will not be allowed to exceed Maximum allowed runtime.
Based on the time required for workload, user should change the Maximum allowed runtime.
Example: Maximum allowed time is changed to 2 hours and 30 min. That means, workload is allowed to run for a maximum duration of 2 hr and 30 min.
Once the workload is launched, user cannot change the total workload time. It has to be configured before running.
d. Scheduled scheduling a job is optional. This step can be skipped if you are trying to run the job immediate. Scheduled field can be used to schedule the job for desired date and time which can be made reccuring daily or weekly. To schedule a job, click the below toggle button
The time which the workload has to be triggered, how frequently the workload has to be triggered i.e, how many days per week or on which week days can be defined under Scheduled field.
Note: Example 1: Here, workload will be run for every two days at time 15:30. Last day is 9/20/2023 after which workload will not trigger.
Example 2: Here, workload will trigger every tuesday and wednesday at time 19:15. Workload will not trigger after 9/21/2023.
The number of GPUS required for the workload has to be selected in this step. It should be selected based on the run script provided in the Application Configuration page for HPC applications whereas for AI/ML applications, GPUs should be selected as required by the user.
Click Next to proceed.
Select Compute
Based on the number of GPUs selected in previous step, list of available queues will be displayed. Based on accelerator type and OS, required queue can be selected in this step.
One of the nodes in the selected queue will be assigned for workload based on availability. If node is occupied, workload goes into pending state. Once the node is available, workload will start running.
Note: In this case, 1CN128C8G2H_2IB_MI210_SLES15 is selected.
Click on Next
Review Workload Submission
Review all the configurations which are selected in the previous steps. Click on Change button of any of the sections in order to change the configurations if required.
Click RUN WORKLOAD
The workload will be submitted to the selected queue and will start when requested resources are ready. Once the workload is finished successfully, the status will change to Completed.