Skip to content

How to Run a Workload

This guide explains how to launch a workload in Plexus from either the Applications or Workloads page. You will choose an application, configure resources, and submit the job.

Screenshots

The screenshots in this guide show example choices (e.g. PyTorch, a specific team). Your options may differ.

Sign in to AAC if needed (see Sign in to AAC).

You can start a workload from the application view or the workload view.

Start from the application view

  1. Open the Applications page.
  2. Select the application you want to run (for example, PyTorch).
  3. Click New Workload.

Applications list with application selected

New Workload button

Select team (application view)

If you belong to more than one team, a dialog opens so you can select the team to use. If you have only one team, this step is skipped. Click Launch to go to the Select input files page.

Team selection dialog

Start from the workload view

  1. Open the Workloads page.
  2. Click New Workload.

Workloads page with New Workload button

Select team (workload view)

If you belong to more than one team, select the team in the dialog. Click Next.

Team selection when starting from Workloads

  1. Select the application you want to run.
  2. If prompted, select the version (Docker or Singularity).
  3. Click New Workload to go to the Select input files page.

Application selected

Application version selected

Select input files

Upload the files needed for the workload.

  1. Click Upload files.
  2. Click Browse Files.
  3. Select the files (up to 5 per upload) and click Next.

Upload files button

Browse Files dialog

Selected files ready to add

Application configuration

Interactive applications

This step is not shown for interactive applications such as PyTorch, TensorFlow, or Ubuntu ROCm.

You can keep the default script or enter custom commands in:

  • Pre-run Script: Commands run before containers are created.
  • Run Script: Commands run inside the container. For some applications, a default benchmarking script is provided.
  • Post-run Script: Commands run after containers are created.

Click Next when done.

Application configuration script fields

Select resources

Set the number of GPUs, maximum runtime, and other options.

Select Resources page

  • Queue oversubscribe: Off by default. If enabled, allocated resources can be shared with other workloads.

Queue oversubscribe option

  • Telemetry enabled: Off by default. If enabled, you will see real-time metrics on the workload detail page during and after the run.

Telemetry enabled option

Performance tab with metrics

  • Maximum allowed runtime: Set how long the workload can run. Default is 1 hour. You cannot change this after the workload is launched.

Maximum allowed runtime and time parameters

  • GPUs: Choose the number of GPUs (for example, based on your run script for HPC or your needs for AI/ML).

GPU and resource configuration

Click Next.

Select compute

Based on the GPUs you selected, a list of available queues is shown. Choose a queue by accelerator type and OS. A node in that queue will be assigned when one is free; if all are busy, the workload stays in Pending until a node is available.

Queue selection

Click Next.

Review workload submission

Review the configuration from the previous steps. Use Change on any section to adjust it, then click Run Workload.

Review workload submission page

The workload is submitted to the selected queue and starts when resources are ready. When it finishes successfully, the status changes to Finished.

Workload parameters and status