Storage and Shared Filesystems
This page explains the shared filesystem layout available on AAC clusters (MI325X and MI355X).
Shared filesystem paths
Both MI325X and MI355X clusters provide the following shared storage locations accessible from all compute nodes via NFS:
$HOME (/shared/amdgpu/home/\<user>)
Purpose: User-specific home directory.
Access: Private to each user (read/write).
Location: NFS-backed at /shared/amdgpu/home/<username>, visible on login and all compute nodes.
Characteristics: - Mounted automatically on all nodes - Persists across sessions and job allocations - Suitable for personal scripts, source code, and small datasets - Backed up (check with your administrator for backup policies)
Best practices:
- Keep source code and personal scripts in $HOME
- Mount $HOME into containers as /workdir for seamless access
- Avoid storing very large datasets in $HOME (use /shared/data instead)
Example:
# When running containers, mount $HOME as /workdir:
podman run -v $HOME:/workdir --workdir /workdir docker://rocm/pytorch-training:v25.5
/shared/apps
Purpose: Shared applications, libraries, and software installations.
Access: Read-only for regular users; managed by administrators.
Use cases:
- ROCm module tree: /shared/apps/modules/ubuntu/modulefiles
- Custom-built applications and tools
- Shared Python virtual environments
- Pre-installed frameworks and libraries
- Environment setup scripts (e.g., aac.modules.bash)
Example:
ls /shared/apps/modules/ubuntu/modulefiles/
# Example contents: rocm/7.2.0, gcc/, python/
/shared/apps2
Purpose: Legacy shared software tree.
Access: Read-only for regular users.
Availability: MI325X only. Not present on MI355X.
Note: New software installations should use /shared/apps. This path is maintained for backward compatibility.
/shared/data
Purpose: Shared scratch and dataset storage area.
Access: Read/write for all users with appropriate permissions.
Use cases: - Large datasets (training data, validation sets) - Pre-built Enroot/Singularity container images (.sqsh files) - Shared model checkpoints and weights - Collaborative project data - Temporary scratch space for large jobs
Example:
ls /shared/data/
# Example contents: datasets/, models/, containers/, scratch/
Local disk on compute nodes
Purpose: Ephemeral local storage on each compute node.
Access: Fast local disk, typically /tmp or /scratch.
Characteristics: - Ephemeral: Data is deleted when your job ends or the node is rebooted - Node-local: Not visible from other nodes - Fast I/O: Use for temporary files, intermediate results, or I/O-intensive workloads
Best practices:
- Use for temporary files that don't need to persist
- Copy final results back to $HOME or /shared/data before job completion
- Ideal for shuffle/sort operations, temporary checkpoints, or build artifacts
Example:
# Use local disk for temporary files:
export TMPDIR=/tmp
cd /tmp && tar xzf /shared/data/large-dataset.tar.gz
# ... process data ...
cp results.txt $HOME/
Storage best practices
-
Use /shared/data for large files: Store datasets, models, and container images in
/shared/datato share across nodes and users. -
Keep code in $HOME: Store your source code, scripts, and personal files in
$HOMEfor easy access and version control. -
Read-only shared resources: Applications and libraries in
/shared/appsare typically managed by administrators. Contact support if you need additional software installed. -
Container mounts: When running containers with Podman, Enroot, or Pyxis, mount the necessary shared paths:
bash podman run -v $HOME:/workdir -v /shared/data:/shared/data -v /shared/apps:/shared/apps ... -
Quota awareness: Check with your administrator about storage quotas for
$HOMEand/shared/data.
Quotas and retention policy
Quota limits and data retention policies are managed by cluster operations and may change over time. Contact your administrator or support team to confirm:
- Current quota limits for $HOME and /shared/data
- Retention policies for shared scratch areas
- Backup schedules and restore procedures
- The process for requesting quota increases if needed
Checking available space
To check available storage space:
# Check $HOME quota and usage
quota -s
# Check shared filesystem usage
df -h /shared/data /shared/apps
# Check local disk usage on compute node
df -h /tmp
Related documentation
- How to Use Podman - Examples of mounting shared paths in containers
- Using Enroot with Pyxis - Container image storage and Pyxis integration
- AAC Slurm Cluster User Guide - General cluster usage and allocation