Glossary

Definitions of terms used in AMD Accelerator Cloud (AAC) documentation. Use these pages as the single source for common concepts; link here instead of repeating definitions in other articles.

Platform and access

AAC (AMD Accelerator Cloud)
A private hosted platform that provides access to AMD hardware, tools, and ready-to-use application software. AAC includes the Plexus web platform and Bare Metal (direct cluster) access.

Plexus
The web-based platform for AAC. You use Plexus to sign in, run workloads, manage files, and access applications without direct SSH to clusters. See What's Plexus.

Bare Metal
Direct access to AAC clusters via SSH and Slurm. Used for advanced users who run containerized or native workloads on the cluster. See Bare Metal overview.

Compute and scheduling

Slurm
An open-source job scheduler used on AAC clusters. Slurm manages partitions, nodes, and jobs. Commands such as salloc, sbatch, and squeue are Slurm commands.

partition
A Slurm concept: a named set of compute nodes with specific hardware and limits. In Plexus, queues are derived from Slurm partitions. On Bare Metal, you specify a partition when allocating resources (e.g. salloc -p <partition_name>). See Bare metal prerequisites for AAC partition naming.

queue
In Plexus, a logical grouping of compute nodes derived from a Slurm partition. Each queue has resource limits, time limits, and access controls. You choose a queue when creating a workload. See What's granted in Plexus.

workload
A unit of work you submit (e.g. running an application on a cluster). In Plexus, a workload corresponds to a Slurm job. See What's a Plexus workload.

compute node
A physical or virtual machine in the cluster that runs jobs. When you allocate a node (e.g. with salloc), you get access to one or more compute nodes.

AMD software and hardware

ROCm
AMD's open software platform for GPU computing. Many AAC applications and Bare Metal environments use a specific ROCm version (e.g. ROCm 6.4.2). Load it on the cluster with module load rocm-<version>.

HIP (Heterogeneous-computing Interface for Portability)
AMD's GPU programming interface. Frameworks such as PyTorch use HIP to run on AMD GPUs. HIP is included in the ROCm stack.

Slurm commands (Bare Metal)

salloc
Allocates resources interactively (e.g. a node with GPUs). You run salloc from the login node, then run commands on the allocated node. Example: salloc --exclusive --mem=0 --gres=gpu:8 -p <partition_name>.

sbatch
Submits a batch script to the scheduler. The job runs when resources are available. Use the -p option to specify the partition.

squeue
Shows the status of submitted jobs (e.g. pending, running).

sinfo
Shows partition and node information (e.g. available partitions and node states).