Accessing the Cluster
Problem
Users need to access the Cluster from the AMD remote server, but must follow specific instructions to log in and allocate the necessary resources.
Instructions for Logging In
Step 1: SSH Login
1.Login to the server: Use the following SSH command to connect to cluster:
ssh -i <priv key> Username@<cluster domain name e.g.,aac1.amd.com>
Replace <priv key>
with your private key and Username with your actual username.
Step 2: Allocate a Node
Once logged in, you will need to allocate a reserved node. Note that SSH commands cannot be used directly on the reserved node. Instead, utilize the salloc command to access the node.
For Exclusive Access to One Node with All 8 GPUs:
Run the following command:
salloc --mem=0 --gres=gpu:8 --reservation=<reservation e.g,s30-05_Reservation >
salloc: (Required) This command allocates resources for a job interactively.
--mem=0: (Optional) Requests all available memory on the node. This is useful if your application requires maximum memory without limitation. It is Optional; if not specified, SLURM will assign a default amount of memory based on the node configuration.
--gres=gpu:8: (Required) Requests 8 GPU resources for your job.
--reservation=
For Exclusive Access to One Node with 1 GPU:
Use the command:
salloc --reservation=<reservation e.g,s30-05_Reservation > --exclusive --mem=256G --gres=gpu:1
--reservation=
--exclusive: (Optional) Requests exclusive access to the allocated node, ensuring no other jobs can use the same node while yours is running. Optional. Use it when you require dedicated resources.
--mem=256G: (Optional) Requests 256 GB of memory for your job. Use When You want to ensure your job has enough memory
--gres=gpu:1: (Required) Requests 1 GPU resource for your job.
For Shared Node with 1 GPU Access:
Execute this command:
salloc --mem=128G --gres=gpu:1 --reservation=<reservation e.g,s30-05_Reservation >
--mem=128G: (Optional) Requests 128 GB of memory for your job. Optional; you can specify theS amount of memory based on your job's requirements.
--gres=gpu:1: (Required) Requests 1 GPU resource for your job.
--reservation=
Step 3: Load ROCm Environment
Once you are logged into the node, choose your module and load the ROCm environment by running the following commands:
module avail
module load <modulefile>
If you are using Mobaxterm or Putty follow this Connecting to AAC Environment .