Skip to content

Bare metal prerequisites

Before following Bare Metal guides (e.g. running applications on the AAC Slurm cluster), ensure you meet the following. Individual guides may add their own requirements (e.g. a specific ROCm version or API token).

Access to the AAC environment

  • You must have Bare Metal access (SSH access to the cluster). This is separate from Plexus account access. Contact your AMD sponsor to request access and submit your SSH public key.
  • You need your private key and cluster hostname (e.g. aac13.amd.com) to connect. See Connecting to the AAC environment for PuTTY and MobaXterm, or AAC Slurm Cluster User Guide for SSH login and Slurm basics.

Partition naming and node allocation

On the AAC Slurm cluster, jobs and interactive sessions use partitions (queues). Partition names follow a convention that encodes CPUs, GPUs, GPU hives, GPU product, and OS (e.g. 256C8G1H_MI355X_Ubuntu22). Your account is granted access to specific partition(s).

To see your partitions and allocate a node, use Slurm commands such as sinfo and salloc. For the full partition naming convention and step-by-step login, allocation, and batch submission, see the AAC Slurm Cluster User Guide.

Common software

Many Bare Metal guides assume one or more of the following on the allocated node:

  • ROCm – AMD's GPU computing platform. Load a specific version via environment modules, for example: module load rocm-6.4.2. Check available versions with module avail. The AAC Slurm Cluster User Guide describes the standard module setup for SBATCH scripts.
  • Podman – Used in several guides to run containerized workloads (e.g. PyTorch, Megatron, NanoGPT). See How to use Podman if you need to install or use Podman on the cluster.

For definitions of terms such as Slurm, partition, ROCm, and HIP, see the Glossary.