GPU Partitioning Modes

AMD Instinct MI325X and MI355X GPUs support multiple compute partitioning modes that divide physical GPUs into logical partitions. This allows you to run multiple workloads on the same physical GPU or optimize resource allocation for specific use cases.

Available GPU partitioning modes

Mode	Description	Logical GPUs per Physical GPU	Physical GPUs on Node	Total Logical GPUs
SPX	Single Partition eXtended	1	8	8
DPX	Dual Partition eXtended	2	8	16
QPX	Quad Partition eXtended	4	8	32
CPX	Compute Partition eXtended	8	8	64

Mode details

SPX (Single Partition eXtended) - Default mode - Each physical GPU appears as 1 logical GPU - Full GPU memory and compute capacity available per logical GPU - Best for: Large model training, workloads requiring maximum memory

DPX (Dual Partition eXtended) - Each physical GPU is divided into 2 logical GPUs - Each logical GPU gets ~50% of memory and compute capacity - Best for: Running 2 independent medium-sized workloads per physical GPU

QPX (Quad Partition eXtended) - Each physical GPU is divided into 4 logical GPUs - Each logical GPU gets ~25% of memory and compute capacity - Best for: Smaller workloads, maximizing multi-tenancy, inference serving

CPX (Compute Partition eXtended) - Specialized compute partition configuration - Optimized for specific compute-intensive workloads - Consult cluster documentation for specific use cases

NUMA modes

NUMA (Non-Uniform Memory Access) modes control how CPU cores and memory are grouped relative to GPU placement.

Availability: - MI325X: NPS1, NPS2, NPS4 configurable - MI355X: Fixed NUMA configuration (not user-configurable)

Mode	NUMA Domains per Socket	Total NUMA Domains	Availability
NPS1	1	2	MI325X only
NPS2	2	4	MI325X only
NPS4	4	8	MI325X only

When to use NUMA modes

NPS1: Simplest configuration; good for most workloads
NPS2: Balanced; good for multi-GPU workloads with moderate NUMA sensitivity
NPS4: Fine-grained control; best for workloads highly sensitive to memory locality

Requesting GPU partitioning modes with Slurm

Use the --constraint= flag with salloc, sbatch, or srun to request specific modes.

Basic examples

Request SPX mode (full GPUs):

salloc -p 256C8G1H_MI325X_Ubuntu22 --gres=gpu:8 --mem=0 --exclusive --constraint=spx --account=<ACCOUNT_NAME>

Example:

salloc -p 256C8G1H_MI325X_Ubuntu22 --gres=gpu:8 --mem=0 --exclusive --constraint=spx --account=myteam

Request DPX mode (dual partitions):

salloc -p 256C8G1H_MI325X_Ubuntu22 --gres=gpu:16 --mem=0 --exclusive --constraint=dpx --account=<ACCOUNT_NAME>

Example:

salloc -p 256C8G1H_MI355X_Ubuntu22 --gres=gpu:16 --mem=0 --exclusive --constraint=dpx --account=myteam

Request QPX mode (quad partitions):

salloc -p 256C8G1H_MI325X_Ubuntu22 --gres=gpu:32 --mem=0 --exclusive --constraint=qpx --account=<ACCOUNT_NAME>

Example:

salloc -p 256C8G1H_MI325X_Ubuntu22 --gres=gpu:32 --mem=0 --exclusive --constraint=qpx --account=myteam

Important: Adjust --gres=gpu:N to match total logical GPUs: - SPX: 8 physical GPUs = --gres=gpu:8 - DPX: 8 physical × 2 = --gres=gpu:16 - QPX: 8 physical × 4 = --gres=gpu:32

NUMA mode examples (MI325X only)

Request NPS1 mode:

salloc -p 256C8G1H_MI325X_Ubuntu22 --gres=gpu:8 --mem=0 --exclusive --constraint=nps1 --account=<ACCOUNT_NAME>

Request NPS2 mode:

salloc -p 256C8G1H_MI325X_Ubuntu22 --gres=gpu:8 --mem=0 --exclusive --constraint=nps2 --account=<ACCOUNT_NAME>

Request NPS4 mode:

salloc -p 256C8G1H_MI325X_Ubuntu22 --gres=gpu:8 --mem=0 --exclusive --constraint=nps4 --account=<ACCOUNT_NAME>

Combining constraints

You can combine GPU partitioning and NUMA modes:

# DPX mode with NPS2 NUMA
salloc -p 256C8G1H_MI325X_Ubuntu22 --gres=gpu:16 --mem=0 --exclusive --constraint="dpx&nps2" --account=<ACCOUNT_NAME>

Example:

salloc -p 256C8G1H_MI325X_Ubuntu22 --gres=gpu:16 --mem=0 --exclusive --constraint="dpx&nps2" --account=myteam

Run two independent jobs on one node with DPX

Allocate a node in DPX mode, then split the 16 logical GPUs between two jobs:

salloc -N 1 -p 256C8G1H_MI355X_Ubuntu22 \
  --exclusive \
  --mem=0 \
  --gres=gpu:16 \
  --constraint=dpx \
  --account=<ACCOUNT_NAME>

From within the allocation, run two separate jobs:

# Job 1 using logical GPUs 0-7
ROCR_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python train_model_a.py &

# Job 2 using logical GPUs 8-15
ROCR_VISIBLE_DEVICES=8,9,10,11,12,13,14,15 python train_model_b.py &

Batch job with specific constraints

Submit a batch job requesting DPX mode and NPS2 NUMA:

#!/bin/bash
#SBATCH -J my_job
#SBATCH -p 256C8G1H_MI325X_Ubuntu22
#SBATCH --gres=gpu:16
#SBATCH --constraint=dpx,nps2
#SBATCH --account=<ACCOUNT_NAME>
#SBATCH --mem=0
#SBATCH -N 1

module load rocm/7.2.0
python my_training_script.py

Verifying GPU partitioning mode

After allocating a node, verify the GPU configuration:

# Load ROCm
module load rocm/7.2.0

# Check GPU product and count
rocm-smi --showproductname

# Check GPU architecture details
rocminfo | grep -E "Name:|gfx"

# List all visible GPUs
rocm-smi

Checking available features

To see which modes are available on cluster nodes:

# Show all node features
sinfo -o "%N %f"

# Show detailed node information
scontrol show node <node_name>

# List nodes with specific constraint
sinfo -o "%N %f" | grep spx

Use case recommendations

Large Model Training (FSDP, DeepSpeed, Megatron): - Mode: SPX - NUMA: NPS1 - Why: Maximum memory per GPU, simplest NUMA configuration

Inference Serving (vLLM, SGLang, TGI): - Mode: QPX or CPX - NUMA: NPS2 or NPS4 - Why: Better multi-tenancy, improved latency with fine-grained NUMA

Multi-tenant Workloads: - Mode: DPX or QPX - Why: Run multiple independent jobs on same physical hardware

MPI/RCCL Benchmarks: - Mode: SPX - NUMA: NPS1 - Why: Clean baseline performance - See: Run RCCL Tests

Important notes

GPU partitioning and NUMA modes are set per node at the hardware/BIOS level
If no node with your requested mode is idle, your job stays pending until one becomes available
Check feature availability with sinfo -N -o "%N %f %t" before submitting jobs
NUMA mode configuration is only available on MI325X - MI355X has fixed NUMA

Node Reference Guide - Complete node specifications
Clusters at a Glance - MI325X vs MI355X comparison
AAC Slurm Cluster User Guide - General cluster usage
Run RCCL Tests - Performance benchmarking

GPU Partitioning Modes

Available GPU partitioning modes

Mode details

NUMA modes

When to use NUMA modes

Requesting GPU partitioning modes with Slurm

Basic examples

NUMA mode examples (MI325X only)

Combining constraints

Run two independent jobs on one node with DPX

Batch job with specific constraints

Verifying GPU partitioning mode

Checking available features

Use case recommendations

Important notes

Related documentation