Skip to content

GPU Partitioning Modes

AMD Instinct MI325X and MI355X GPUs support multiple compute partitioning modes that divide physical GPUs into logical partitions. This allows you to run multiple workloads on the same physical GPU or optimize resource allocation for specific use cases.

Available GPU partitioning modes

Mode Description Logical GPUs per Physical GPU Physical GPUs on Node Total Logical GPUs
SPX Single Partition eXtended 1 8 8
DPX Dual Partition eXtended 2 8 16
QPX Quad Partition eXtended 4 8 32
CPX Compute Partition eXtended 8 8 64

Mode details

SPX (Single Partition eXtended) - Default mode - Each physical GPU appears as 1 logical GPU - Full GPU memory and compute capacity available per logical GPU - Best for: Large model training, workloads requiring maximum memory

DPX (Dual Partition eXtended) - Each physical GPU is divided into 2 logical GPUs - Each logical GPU gets ~50% of memory and compute capacity - Best for: Running 2 independent medium-sized workloads per physical GPU

QPX (Quad Partition eXtended) - Each physical GPU is divided into 4 logical GPUs - Each logical GPU gets ~25% of memory and compute capacity - Best for: Smaller workloads, maximizing multi-tenancy, inference serving

CPX (Compute Partition eXtended) - Specialized compute partition configuration - Optimized for specific compute-intensive workloads - Consult cluster documentation for specific use cases

NUMA modes

NUMA (Non-Uniform Memory Access) modes control how CPU cores and memory are grouped relative to GPU placement.

Availability: - MI325X: NPS1, NPS2, NPS4 configurable - MI355X: Fixed NUMA configuration (not user-configurable)

Mode NUMA Domains per Socket Total NUMA Domains Availability
NPS1 1 2 MI325X only
NPS2 2 4 MI325X only
NPS4 4 8 MI325X only

When to use NUMA modes

  • NPS1: Simplest configuration; good for most workloads
  • NPS2: Balanced; good for multi-GPU workloads with moderate NUMA sensitivity
  • NPS4: Fine-grained control; best for workloads highly sensitive to memory locality

Requesting GPU partitioning modes with Slurm

Use the --constraint= flag with salloc, sbatch, or srun to request specific modes.

Basic examples

Request SPX mode (full GPUs):

salloc -p 256C8G1H_MI325X_Ubuntu22 --gres=gpu:8 --mem=0 --exclusive --constraint=spx --account=<ACCOUNT_NAME>

Example:

salloc -p 256C8G1H_MI325X_Ubuntu22 --gres=gpu:8 --mem=0 --exclusive --constraint=spx --account=myteam

Request DPX mode (dual partitions):

salloc -p 256C8G1H_MI325X_Ubuntu22 --gres=gpu:16 --mem=0 --exclusive --constraint=dpx --account=<ACCOUNT_NAME>

Example:

salloc -p 256C8G1H_MI355X_Ubuntu22 --gres=gpu:16 --mem=0 --exclusive --constraint=dpx --account=myteam

Request QPX mode (quad partitions):

salloc -p 256C8G1H_MI325X_Ubuntu22 --gres=gpu:32 --mem=0 --exclusive --constraint=qpx --account=<ACCOUNT_NAME>

Example:

salloc -p 256C8G1H_MI325X_Ubuntu22 --gres=gpu:32 --mem=0 --exclusive --constraint=qpx --account=myteam

Important: Adjust --gres=gpu:N to match total logical GPUs: - SPX: 8 physical GPUs = --gres=gpu:8 - DPX: 8 physical × 2 = --gres=gpu:16 - QPX: 8 physical × 4 = --gres=gpu:32

NUMA mode examples (MI325X only)

Request NPS1 mode:

salloc -p 256C8G1H_MI325X_Ubuntu22 --gres=gpu:8 --mem=0 --exclusive --constraint=nps1 --account=<ACCOUNT_NAME>

Request NPS2 mode:

salloc -p 256C8G1H_MI325X_Ubuntu22 --gres=gpu:8 --mem=0 --exclusive --constraint=nps2 --account=<ACCOUNT_NAME>

Request NPS4 mode:

salloc -p 256C8G1H_MI325X_Ubuntu22 --gres=gpu:8 --mem=0 --exclusive --constraint=nps4 --account=<ACCOUNT_NAME>

Combining constraints

You can combine GPU partitioning and NUMA modes:

# DPX mode with NPS2 NUMA
salloc -p 256C8G1H_MI325X_Ubuntu22 --gres=gpu:16 --mem=0 --exclusive --constraint="dpx&nps2" --account=<ACCOUNT_NAME>

Example:

salloc -p 256C8G1H_MI325X_Ubuntu22 --gres=gpu:16 --mem=0 --exclusive --constraint="dpx&nps2" --account=myteam

Run two independent jobs on one node with DPX

Allocate a node in DPX mode, then split the 16 logical GPUs between two jobs:

salloc -N 1 -p 256C8G1H_MI355X_Ubuntu22 \
  --exclusive \
  --mem=0 \
  --gres=gpu:16 \
  --constraint=dpx \
  --account=<ACCOUNT_NAME>

From within the allocation, run two separate jobs:

# Job 1 using logical GPUs 0-7
ROCR_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python train_model_a.py &

# Job 2 using logical GPUs 8-15
ROCR_VISIBLE_DEVICES=8,9,10,11,12,13,14,15 python train_model_b.py &

Batch job with specific constraints

Submit a batch job requesting DPX mode and NPS2 NUMA:

#!/bin/bash
#SBATCH -J my_job
#SBATCH -p 256C8G1H_MI325X_Ubuntu22
#SBATCH --gres=gpu:16
#SBATCH --constraint=dpx,nps2
#SBATCH --account=<ACCOUNT_NAME>
#SBATCH --mem=0
#SBATCH -N 1

module load rocm/7.2.0
python my_training_script.py

Verifying GPU partitioning mode

After allocating a node, verify the GPU configuration:

# Load ROCm
module load rocm/7.2.0

# Check GPU product and count
rocm-smi --showproductname

# Check GPU architecture details
rocminfo | grep -E "Name:|gfx"

# List all visible GPUs
rocm-smi

Checking available features

To see which modes are available on cluster nodes:

# Show all node features
sinfo -o "%N %f"

# Show detailed node information
scontrol show node <node_name>

# List nodes with specific constraint
sinfo -o "%N %f" | grep spx

Use case recommendations

Large Model Training (FSDP, DeepSpeed, Megatron): - Mode: SPX - NUMA: NPS1 - Why: Maximum memory per GPU, simplest NUMA configuration

Inference Serving (vLLM, SGLang, TGI): - Mode: QPX or CPX - NUMA: NPS2 or NPS4 - Why: Better multi-tenancy, improved latency with fine-grained NUMA

Multi-tenant Workloads: - Mode: DPX or QPX - Why: Run multiple independent jobs on same physical hardware

MPI/RCCL Benchmarks: - Mode: SPX - NUMA: NPS1 - Why: Clean baseline performance - See: Run RCCL Tests

Important notes

  • GPU partitioning and NUMA modes are set per node at the hardware/BIOS level
  • If no node with your requested mode is idle, your job stays pending until one becomes available
  • Check feature availability with sinfo -N -o "%N %f %t" before submitting jobs
  • NUMA mode configuration is only available on MI325X - MI355X has fixed NUMA