Skip to content

CLI Guide with Slurm Basics

Problem

You need to access the cluster from the AMD remote server and must follow specific instructions to log in and allocate the necessary resources.

Instructions for logging in

Step 1: SSH login

  1. Log in to the server using the following SSH command to connect to the cluster:
ssh -i <priv key> Username@<cluster domain name e.g.,aac13.amd.com>

Replace <priv key> with your private key and Username with your actual username.

Step 2: Allocate a node

Once logged in, allocate a reserved node. Note that SSH commands cannot be used directly on the reserved node. Instead, utilize the salloc command to access the node.

For Exclusive Access to One Node with All 8 GPUs:

Run the following command:

salloc --mem=0 --gres=gpu:8 --reservation=<reservation e.g,PU045A_reservation > 

salloc: (Required) This command allocates resources for a job interactively.

--mem=0: (Optional) Requests all available memory on the node. This is useful if your application requires maximum memory without limitation. It is Optional; if not specified, SLURM will assign a default amount of memory based on the node configuration.

--gres=gpu:8: (Required) Requests 8 GPU resources for your job.

--reservation=: (Required) This specifies the reservation to use. Replace with your actual reservation name (e.g., PU045A_reservation).

For Exclusive Access to One Node with 1 GPU:

Use the command:

salloc --reservation=<reservation e.g,PU045A_reservation >  --exclusive --mem=256G --gres=gpu:1

--reservation=: (Required) As above, specifies the reservation.

--exclusive: (Optional) Requests exclusive access to the allocated node, ensuring no other jobs can use the same node while yours is running. Optional. Use it when you require dedicated resources.

--mem=256G: (Optional) Requests 256 GB of memory for your job. Use when you want to ensure your job has enough memory.

--gres=gpu:1: (Required) Requests 1 GPU resource for your job.

For Shared Node with 1 GPU Access:

Execute this command:

salloc --mem=128G --gres=gpu:1 --reservation=<reservation e.g,PU045A_reservation > 

--mem=128G: (Optional) Requests 128 GB of memory for your job. Optional; you can specify the amount of memory based on your job's requirements.

--gres=gpu:1: (Required) Requests 1 GPU resource for your job.

--reservation=: (Required) Specifies the reservation to use, similar to the previous commands.

Step 3: Load ROCm environment

Once you are logged into the node, choose your module and load the ROCm environment by running the following commands:

module avail 
module load <ROCm Module>
example - module load rocm/6.4.2

If you use MobaXterm or PuTTY, see Connecting to the AAC environment.