Skip to content

Storage and Shared Filesystems

This page explains the shared filesystem layout available on AAC clusters (MI325X and MI355X).

Shared filesystem paths

Both MI325X and MI355X clusters provide the following shared storage locations accessible from all compute nodes via NFS:

$HOME (/shared/amdgpu/home/\<user>)

Purpose: User-specific home directory.

Access: Private to each user (read/write).

Location: NFS-backed at /shared/amdgpu/home/<username>, visible on login and all compute nodes.

Characteristics: - Mounted automatically on all nodes - Persists across sessions and job allocations - Suitable for personal scripts, source code, and small datasets - Backed up (check with your administrator for backup policies)

Best practices: - Keep source code and personal scripts in $HOME - Mount $HOME into containers as /workdir for seamless access - Avoid storing very large datasets in $HOME (use /shared/data instead)

Example:

# When running containers, mount $HOME as /workdir:
podman run -v $HOME:/workdir --workdir /workdir docker://rocm/pytorch-training:v25.5

/shared/apps

Purpose: Shared applications, libraries, and software installations.

Access: Read-only for regular users; managed by administrators.

Use cases: - ROCm module tree: /shared/apps/modules/ubuntu/modulefiles - Custom-built applications and tools - Shared Python virtual environments - Pre-installed frameworks and libraries - Environment setup scripts (e.g., aac.modules.bash)

Example:

ls /shared/apps/modules/ubuntu/modulefiles/
# Example contents: rocm/7.2.0, gcc/, python/

/shared/apps2

Purpose: Legacy shared software tree.

Access: Read-only for regular users.

Availability: MI325X only. Not present on MI355X.

Note: New software installations should use /shared/apps. This path is maintained for backward compatibility.

/shared/data

Purpose: Shared scratch and dataset storage area.

Access: Read/write for all users with appropriate permissions.

Use cases: - Large datasets (training data, validation sets) - Pre-built Enroot/Singularity container images (.sqsh files) - Shared model checkpoints and weights - Collaborative project data - Temporary scratch space for large jobs

Example:

ls /shared/data/
# Example contents: datasets/, models/, containers/, scratch/

Local disk on compute nodes

Purpose: Ephemeral local storage on each compute node.

Access: Fast local disk, typically /tmp or /scratch.

Characteristics: - Ephemeral: Data is deleted when your job ends or the node is rebooted - Node-local: Not visible from other nodes - Fast I/O: Use for temporary files, intermediate results, or I/O-intensive workloads

Best practices: - Use for temporary files that don't need to persist - Copy final results back to $HOME or /shared/data before job completion - Ideal for shuffle/sort operations, temporary checkpoints, or build artifacts

Example:

# Use local disk for temporary files:
export TMPDIR=/tmp
cd /tmp && tar xzf /shared/data/large-dataset.tar.gz
# ... process data ...
cp results.txt $HOME/

Storage best practices

  1. Use /shared/data for large files: Store datasets, models, and container images in /shared/data to share across nodes and users.

  2. Keep code in $HOME: Store your source code, scripts, and personal files in $HOME for easy access and version control.

  3. Read-only shared resources: Applications and libraries in /shared/apps are typically managed by administrators. Contact support if you need additional software installed.

  4. Container mounts: When running containers with Podman, Enroot, or Pyxis, mount the necessary shared paths: bash podman run -v $HOME:/workdir -v /shared/data:/shared/data -v /shared/apps:/shared/apps ...

  5. Quota awareness: Check with your administrator about storage quotas for $HOME and /shared/data.

Quotas and retention policy

Quota limits and data retention policies are managed by cluster operations and may change over time. Contact your administrator or support team to confirm: - Current quota limits for $HOME and /shared/data - Retention policies for shared scratch areas - Backup schedules and restore procedures - The process for requesting quota increases if needed

Checking available space

To check available storage space:

# Check $HOME quota and usage
quota -s

# Check shared filesystem usage
df -h /shared/data /shared/apps

# Check local disk usage on compute node
df -h /tmp