Skip to content

Ensuring Persistent SSH Sessions on AAC

For users working on AAC systems over SSH, interruptions due to unstable connections or temporary disconnections can halt progress or especially during long tasks the session can timeout due to SSH session default time limits. Tmux and Screen offer a solution by creating persistent terminal sessions that remain active even when you log out. This guide will help you get started with these tools to maintain uninterrupted workflows.

End-to-End Flow Using Tmux and Screen: Accessing the Cluster

Scenario

You need to access a compute cluster from an AMD remote server. The process involves:

  • SSH Login: Establishing a secure shell connection to the cluster.
  • Allocate a Node: Requesting a compute node for your workloads.
  • Load ROCm Environment: Setting up the necessary environment for running ROCm applications.

Step 1: SSH Login

Use the following SSH command to connect to the cluster:

ssh -i <priv key> Username@<cluster domain name e.g.,aac1.amd.com>

Replace <priv key> with your private key and Username with your actual username.

Step 2: Initiating a Session

Using Tmux

Start a new Tmux session:

tmux new -s cluster_access

Using Screen

Start a new Screen session:

screen -S cluster_access

Step 3: Allocate a Node

Use the salloc command to allocate the necessary resources.

For Exclusive Access to One Node with All 8 GPUs:

salloc --mem=0 --gres=gpu:8 --reservation=<reservation e.g,s30-05_Reservation > 

Step 4: Load ROCm Environment

Once logged into the node, load the ROCm environment:

module avail 
module load <modulefile>

Step 5: Leaving the Session

To leave the session without terminating it:

For Tmux

Ctrl + b, then d

For Screen

Ctrl + a, then d

Step 6: Comeback and Check Previous Sessions

To see your previous sessions:

For Tmux

tmux ls

For Screen

screen -ls

Step 7: Attach to Previous Session

To reattach to your session:

For Tmux

tmux attach -t cluster_access

For Screen

screen -r cluster_access

Step 8: Monitor Progress

Once reattached, you can continue to monitor your workload.

Step 9: Terminate the Session

After your workload finishes, exit the session:

exit