Run Multinode TensorFlow Application
This guide shows how to run a multi-node TensorFlow application. Sign in to AAC if you have not already.
Select application
- Click Applications in the navigation bar.
- In the filter, type Multinode and wait for the TensorFlow application family to appear.
- Open the family and select Tensorflow Multinode application.
- Click New Workload.
Select team
If you have more than one team, select one in the pop-up and click Launch.
Note
In this example we use AMD Internal.
Select input files
Upload any input files via Upload files or Browse files, then click Next. You can reuse files from a previous workload. If you have no files to upload, click Next.
Configure run script
Set the input_script variable to your Python script. If you leave it unchanged, the workload runs the default script. Input files are in /home/aac, e.g. input_script=/home/aac/custom_script.py.
Select resources
Set the number of nodes, GPUs per node, and maximum allowed runtime. Click Next.
Select compute
Step 11: Select Compute. Required Name of the Queue and cluster name can be searched for running the workload. After selecting Compute parameters, click NEXT.
Review workload submission
Review the workload details. Use Change in any section to edit. Estimated costs appear at the bottom. Click Run Workload to launch; you are redirected to the Workloads page.
Monitor workload
After you submit, monitor the workload on the Workloads page and the Workload Information page. Workload states:
Created – The workload has been created in the system
Sent – The workload has been sent to the queue that you selected in the workload submission process
Pending – The workload is in a waiting state in the queue
Running – The workload has started running in the selected queue
Completed – The workload has successfully finished processing
Failed – A problem has occurred which has prevented the workload from completing successfully
Canceled – The workload has been canceled by the user and stopped running
Click the workload to open the Workload Information page. The left panel shows submission details (name, application, input files, run script, resources). The right panel shows activity, queue status, cost, and workload info.
View Log Click on the links shown on the Workload Information screen to view the job log, stdout log and stderr log:
View Log - Information about the workload throughout the entire process
View Stdout - Standard output that presents the output of a workload and sometimes includes the results of the workload.
View Stderr - Standard Error that helps you understand why you may have encountered certain issues during the process.
Download log files - Download all information about the Log, STDOUT and Stderr log files.
When the script finishes, the workload completes automatically. Files generated in /home/aac are uploaded to Plexus.

















