How To Run TensorFlow Docker Application
Login To AAC
Login to https://aac.amd.com/.
Select Application
Click on Applications. Select TensorFlow.
In the Select An Application pop-up, select the desired TensorFlow version with container type as docker.
Note: In this case, we have selected TensorFlow 2-10 ROCm 5-4-1 version and container as docker.
New Workload
Click on New Workload button available on the top right corner.
Select Team
In case the user is assigned to more than one team, a pop window will required to select one of the customer teams to which the user is part of. If there is just one team assigned to the user, it will not be required.
Note: In this case, we have selected Team as AMD Internal.
Click Start new workload button.
Click Next button to continue.
Select Resources
In Select Resources page, specify the number of GPU's (e.g., 1 GPU) and max allowed runtime required for the workload. Click Next button.
Note: The maximum number of GPUs should be 8.
Select the cluster and desired queue to run the job. In this case 1CN128C8G2H_2IB_MI210_SLES15 (Pre-emptible) - AAC Plano is selected. Click on Next
Review Workload Submission
Review all the configurations selected and click on Run Workload
Once the workload is submitted, the workload status changes to Running when queue is available. Click on the running workload
User can see the system logs in SYSLOG, output in STDOUT and errors in STDERR tabs.
A token will be generated in STDOUT tab in yellow color as shown below. Copy the token.
Interactive Endpoints
Once the interactive endpoints are enabled, click Connect to launch ML Studio(Jupyter lab).
Jupyter lab opens. Paste the token in Password or token field. Click login.
User can see Jupyter lab, which can be used for python based development work.
Click on Terminal to open it.
In terminal, enter the following benchmark: Benchmark - python3 /root/benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py --model=resnet50 --num_gpus=8 --batch_size=256 --num_batches=100 -- print_training_accuracy=True --variable_update=parameter_server --local_parameter_device=gpu
Collect Performance Metrics
Once the work is done with Jupyter lab, Close it.
Finish Workload
Click Finish Workload button.
Download Logs
Logs can be downloaded from STDOUT tab by clicking Download Logs once workload is finished.