Setup Conda Environment
How To Setup a PyTorch Conda Environment on AAC Slurm Cluster
Allocate an Ubuntu 8GPU MIXXX job from the partition 384C8G1H_MI300X_Ubuntu22
salloc --exclusive --mem=0 --gres=gpu:8 -p 384C8G1H_MI300X_Ubuntu22
In the allocated session, load ROCm
module avail
module load rocm-<ROCm module>
example - module load rocm-6.4.2
Load anaconda3 modulefile <anaconda module>
module load <anaconda module>
example - module load anaconda3/25.5.1
Source the conda.sh
file
. $CONDA_ROOT/etc/profile.d/conda.sh
Create conda environment named pt-stable
with python 3.10.12
conda create -n pt-stable python=3.10.12 -y
Accept the Terms of Service and run the above command again
conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/main
conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/r
conda create -n pt-stable python=3.10.12 -y
Activate the conda environment that was just created
conda activate pt-stable
Install Stable Release of Pytorch https://pytorch.org/get-started/locally/
pip3 --no-cache-dir install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.4.2