Skip to content

How to Use Podman on AMD Accelerator Cloud to Use rocgdb to Debug Application

Introduction

This document describes the steps to allocate and SSH to an AMD GPU Compute Node on Slurm cluster and use Podman to run ROCm docker image to use rocgdb to debug sources outside the container environment, under $HOME directory mounted as /workdir inside docker.

Steps to Use Podman to Run rocgdb to Debug on AAC Compute Node

Allocate and SSH to an AMD GPU Compute Node

From the Slurm login node, the following example shows the command to allocate and ssh to an AAC compute node.

ssubrama1@pl1vm1pctlnode02:~$ salloc -N 1 --exclusive --mem=0 --gres=gpu:8 -p 1CN96C8G1H_4IB_MI250_Ubuntu22
salloc: Granted job allocation 59093
salloc: Waiting for resource configuration
salloc: Nodes ubb-r09-13 are ready for job
ssubrama1@ubb-r09-13:~$

Run ROCm 6.1.2 Docker Image Using Podman

The following command launches amddcgpu/rocm:6.1.2-ub22 docker image in interactive mode using podman, mounting $HOME directory as /workdir and invoking bash at startup.

ssubrama1@ubb-r09-13:~$ podman run -it --privileged --network=host --ipc=host -v $HOME:/workdir -v
/shareddata:/shareddata -v /shareddata.ai:/shareddata.ai -v /shared/apps:/shared/apps --workdir /workdir
docker://amddcgpuce/rocm:6.1.2-ub22 bash
Trying to pull docker.io/amddcgpuce/rocm:6.1.2-ub22...
Getting image source signatures
Copying blob f9bc1003674c done
Copying blob 7a0f50daed85 done
Copying blob 92922ca93c2a done
Copying blob aece8493d397 done
Copying config dd336a3e06 done
Writing manifest to image destination
Storing signatures
root@ubb-r09-13:/workdir#

Compile and Debug Sample Application hellowgpu.cpp Using Docker Environment (ROCm 6.1.2)
The following example shows the user navigating to src/ directory under $HOME (mounted inside podman environment as /workdir), compiling hellowgpu.cpp sample application, launch rocgdb to debug the application, and exiting the debugger.

root@ubb-r09-13:/workdir# cd src/
root@ubb-r09-13:/workdir/src# ls
a.out hello.c hellowgpu.cpp python
root@ubb-r09-13:/workdir/src# which hipcc
/opt/rocm-6.1.2/bin/hipcc
root@ubb-r09-13:/workdir/src# hipcc -g hellowgpu.cpp
root@ubb-r09-13:/workdir/src# rocgdb ./a.out
GNU gdb (rocm-rel-5.7-98) 13.2
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://github.com/ROCm-Developer-Tools/ROCgdb/issues>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./a.out...
(gdb) break main
Breakpoint 1 at 0x20e7e7: file hellowgpu.cpp, line 20.
(gdb) run
Starting program: /workdir/src/a.out
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Breakpoint 1, main (argc=1, argv=0x7fffffffe598) at hellowgpu.cpp:20
20 if (argc > 1)
(gdb) bt
#0 main (argc=1, argv=0x7fffffffe598) at hellowgpu.cpp:20
(gdb) quit
A debugging session is active.
Inferior 1 [process 261] will be killed.
Quit anyway? (y or n) y
root@ubb-r09-13:/workdir/src

Exit the podman session to native Slurm module environment The following command shows exiting the ROCm 6.1.2 podman environment to the native SSH shell prompt.

root@ubb-r09-13:/workdir/src# exit
exit
ssubrama1@ubb-r09-13:~$

Release the allocated AMD GPU Compute Node to terminate the session The following command shows the release of the allocated AMD Compute Node back to the Slurm queue, and the user back to the Slurm login node shell environment.

ssubrama1@ubb-r09-13:~$ exit
exit
Job Ended Successfully
salloc: Relinquishing job allocation 59093
ssubrama1@pl1vm1pctlnode02:~$