Create Docker Application

This article explains how to create Docker applications in AAC. To complete this procedure, you need Developer or Admin permissions on AAC.

Open New Application

Click on the Applications option in the top bar menu, and then click on Applications again.
Once the application view is open, click on the New Application button located at the top left.

Note: If you do not see the + New Application button, you do not have Developer or Admin permission.

Select Docker

Select Docker on the container type screen. click the NEXT button.

Configure General Attributes

In the general information panel, configure the general application attributes. Refer to the image below for guidance.

Name: Name of the application.
Description: Description of the application.
Family: Application family.
Version number: ROCM version of the application.
Allow replicas: Toggle on/off to allow the application to run on multiple nodes.
Categories: Assign a category to the application.
Architectures: Choose either x86_64 or arm64.
Accelerator: Select either AMD GPU or NVIDIA GPU.
Is service: Indicates that the application will be executed as a Docker interactive service.
Is stateful set: Toggle on/off for stateful set. This is a Kubernetes container feature.
Ports: Configure the application ports to be exposed to users. Multiple ports can be configured by clicking the Add Port button.
- Example for JupyterLab with SSH connection: ui: 8888, ssh: 22
Featured: Toggle on/off. When on, this application will appear at the top of the Applications list in the Featured section.

Stateful Sets

Stateful sets are useful in Kubernetes when you need to maintain a predictable hostname list for multi-POD jobs, as shown in the following example:

some_hostname_prefix_0 is the hostname of the 1st replica.
some_hostname_prefix_1 is the hostname of the 2nd replica.
some_hostname_prefix_2 is the hostname of the 3rd replica.

With stateful sets, the same hostname is retained between POD failures/restarts. This is beneficial for MPI jobs or jobs where PODs need to register with a centralized server, such as Spark and NiFi. Since the hostnames follow a predictable pattern, you always maintain the same fixed host list for your job, even after a POD failure/restart.

Repository authentication

If your application uses containers stored in a private image repository, you will need to set up the following authentication fields:

User: The username of the Docker image repository account.
Password: The password of the Docker image repository account. This can also be the security token provided by some repositories, such as NVIDIA.
Server: The Docker repository server URL. For Docker Hub, this is: https://index.docker.io/v1/

After filling in the application attributes, click the NEXT button.

Configure Settings

In the settings panels, you can configure the application settings.

The most common settings to configure are:

Help URI: A URL containing help articles on how to run the application.
Workload default prerun script: The default script to execute before the container scripts are run. This can also be defined during the job submission process.
Workload default postrun script: The default script to execute after the container scripts are run. This can also be defined during the job submission process.

After configuring the application settings, click the NEXT button.

Configure Containers

First, you need to select an existing container or create a new one.

Create Container

Click the ellipsis (…) button, and then select Add new container from the dropdown menu.

Fill in the container information parameters:

Name: Container name.
URL: Container URL.
Architectures: Select either x86_64 or aarch64.
Version: Container version.
GPU: Check this box if the container requires GPU resources.
Description (optional): Description of the application container.

Click SAVE to save the container.

Configure Container

Click on Select container, then choose the container you have just created. After that, click the Add application container button.

Set the following parameters to configure the application:

Name: The name to be used for this container within the application. Each container name must be unique within a given application. You may use the same container multiple times within an application, but each instance must have a different name (e.g., redis-server-1, redis-server-2).
Order: The order of execution for the container within the application. Use integer format (e.g., 1, 2, 3).
Number of CPUs: The number of CPUs required for the container execution. Use integer format. If set to 0, it will run without limits or under cluster default limits.
Memory: The amount of memory (in megabytes) required for the container execution. Use integer format. If set to 0, it will run without limits or under cluster default limits.
Number of GPUs: The number of GPUs required for the container execution. If the container requires a GPU and no value is set, the AAC will use 1 GPU by default. Use integer format.
Mount list: List of mount points in the container. Add new mounts by clicking Add mount. Volumes can have the following formats:
- ./host/dir:/container/dir:rw. Write permissions can be read/write (rw) or read-only (ro).
  - Mount points will be under /home/aac. For example, /home/aac/host/dir.
- tmpfs mount points can also be defined using this format: /container_dir.tmpfs:tmpfs.
  - Example: ./redis:/redis:ro, ./redis/data:/data_redis:rw, or container/dir_tmpfs:tmpfs.
  - In the case of shm memory, it is configured by default in the system.
Environment variables: List of environment variables to set up inside the container. Add new variables by clicking Add Environment Var.
- Format: VAR1=value1
- Examples: HOSTNAME=localhost, PORT=6379
Health check: Command to test the container’s health. If the health check fails, the container will be restarted.
- In K8s services, if the container has more than 10 unsuccessful restart attempts, the application execution will be marked as Failed.
Ready check: Command to verify that the container is ready to be exposed to the client.
- Ports are not opened until the container is ready.
Prerun script: Command to be executed before the run script.
Run script: Command to run the container. Analytics metrics are tracked over this script.
- For batch jobs, this is a permanent command that cannot be modified for each run (see next item, Example run script).
Example Run script: Parameter available only for bash (non-service) applications.
- Here you can include the command you want to execute.
Postrun script: Command to run in the container after the application completes execution.
- This script is executed only in batch job applications when they are not forced to stop.

After filling in the container parameters, click Save to save the container configuration, and then click NEXT.

Examples

Interactive Application

The image illustrates the configuration of an Ubuntu SSH interactive application. It's necessary to add the script to configure the aac user, install the SSH service, and declare the environment variables at the container level on /etc/bash.bashrc:

It includes on the prerun script the following content:

useradd aac
echo "aac:$PLEXUS_SECRET_KEY" | chpasswd
echo "USERNAME: aac"
echo "PASSWORD: $PLEXUS_SECRET_KEY"
usermod -aG sudo aac
if [ -n "$HOST_USER_ID" ] && [ -n "$HOST_GROUP_ID" ]; then
  groupmod -g $HOST_GROUP_ID aac
  usermod -u $HOST_USER_ID -g $HOST_GROUP_ID  aac
fi
usermod --shell /bin/bash aac
export -p | grep -v "declare -x HOME" >> /etc/bash.bashrc

On the runscript we install ssh daemon and keep the container busy:

echo "starting ssh daemon..."
apt update && apt-get install openssh-server -y && service ssh start
while true; do sleep 100; done

Batch Application

The image illustrates the configuration of a NAND batch application.

Multiple Containers on a Application

Only service-type applications (where the Is service option has been selected) can support multiple containers. Therefore, the interface will only allow you to add a single container to applications where the Is service option has not been selected.

For services, multiple instances of the same container can be configured within the same application, but each container must have a unique name to be used inside the application.

Review the Application

Review the application details and make any necessary edits or corrections.

Click CREATE to execute the creation procedure.

Connect to the Container via SSH

Once a workload is submitted, it will move through the queue and begin once the required resources are available. When the workload's state transitions from Pending to Running, you can access its details.

Accessing Workload Details

To view the SSH connection options, navigate to the workload in the dashboard. Once it's in the Running state, click on the workload to view all associated details. On the right-hand side of the page, you will see the following sections as shown below.

Interactive Endpoints
Service Terminal

You can connect to the container using either of these methods. Below is an overview of both options.

1. Service Terminal

The Service Terminal is a quick access method to interact with the container via a websocket-based terminal. However, this method is not recommended for development purposes. It is primarily useful for administrative tasks or quick troubleshooting, but it is prone to timeouts and potential disconnections due to the nature of the websocket connection.

Limitations: This option is not suitable for long-running tasks or development work.
Usage: Click on Service Terminal to open the connection. This will redirect you to a terminal interface where you can interact with the container as shown in below. However, bear in mind that it may disconnect if idle for too long or experience connection issues.

2. Interactive Endpoints (Recommended)

For development purposes or a more stable SSH connection, recommend using Interactive Endpoints. This method provides you with a secure SSH connection to the container, using SSH keys for authentication.

Important: This container is accessible over the internet using the CLI, meaning you can connect remotely to the container via SSH from any internet-connected machine using the provided details.

Step-by-Step Guide to Connecting via Interactive Endpoints

Step 1: Locate the Interactive Endpoints section

Once the workload is in the Running state, on the right-hand side of the workload details page, scroll down to the Interactive Endpoints section. Here, you will find two key pieces of information:

SSH Port: The port to use when establishing the SSH connection.
Secret Key: The private key used for SSH authentication.

Step 2: Click Connect

Click on the Connect button to reveal the SSH URL and Password for your container as shown below. The SSH URL will be displayed in the following format:

ssh -p <port> <username>@<hostname>

For example:

ssh -p 99999 aac@aac1.amd.com

The Password field will contain a password used for the initial connection.

Step 3: Connect from your local terminal

Open a terminal on your local machine.
Copy the SSH URL and paste it into your terminal as shown below.

Example:

ssh -p 99999 aac@aac1.amd.com

When prompted, enter the password you copied from the Password field. This will authenticate the connection.

Once connected, you will have access to the container, where you can run commands, check logs, and perform development tasks as needed.