Skip to content

Create a Docker Application

This guide explains how to create Docker applications in AMD Accelerator Cloud (AAC). You need Developer or Admin permissions.

Open new application

  1. Click Applications in the top bar, then Applications again.
  2. Click New Application at the top left.

Open new application

Note

If you do not see + New Application, you do not have Developer or Admin permission.

Select Docker

Select Docker on the container type screen, then click Next.

Select Docker

Configure general attributes

In the general information panel, configure the application attributes.

  • Name: Name of the application.
  • Description: Description of the application.
  • Family: Application family.
  • Version number: ROCm version of the application.
  • Allow replicas: Toggle on/off to allow the application to run on multiple nodes.
  • Categories: Assign a category to the application.
  • Architectures: Choose either x86_64 or arm64.
  • Accelerator: Select either AMD GPU or NVIDIA GPU.
  • Is service: Indicates that the application will be executed as a Docker interactive service.
  • Is stateful set: Toggle on/off for stateful set. This is a Kubernetes container feature.
  • Ports: Configure the application ports to be exposed to users. Click Add Port to add more. Example for JupyterLab with SSH: ui: 8888, ssh: 22.
  • Featured: Toggle on/off. When on, this application will appear at the top of the Applications list in the Featured section.

Stateful sets

Stateful sets are useful in Kubernetes when you need a predictable hostname list for multi-POD jobs:

  • some_hostname_prefix_0 is the hostname of the 1st replica.
  • some_hostname_prefix_1 is the hostname of the 2nd replica.
  • some_hostname_prefix_2 is the hostname of the 3rd replica.

With stateful sets, the same hostname is retained between POD failures/restarts. This is beneficial for MPI jobs or jobs where PODs need to register with a centralized server, such as Spark and NiFi. Since the hostnames follow a predictable pattern, you always maintain the same fixed host list for your job, even after a POD failure/restart.

Repository authentication

If your application uses containers stored in a private image repository, you will need to set up the following authentication fields:

  • User: The username of the Docker image repository account.
  • Password: The password of the Docker image repository account. This can also be the security token provided by some repositories, such as NVIDIA.
  • Server: The Docker repository server URL. For Docker Hub, this is: https://index.docker.io/v1/

After filling in the attributes, click Next.

General attributes

Configure settings

In the settings panel you configure the application settings.

The most common settings to configure are:

  • Help URI: A URL containing help articles on how to run the application.
  • Workload default prerun script: Applies to batch workloads. The default script to execute before the container scripts are run. This can also be defined during the job submission process.
  • Workload default postrun script: Applies to batch workloads. The default script to execute after the container scripts are run. This can also be defined during the job submission process.

After configuring the settings, click Next.

Settings

Configure containers

Select an existing container or create a new one. Ensure container images include bash; Alpine-based images often do not have bash by default.

Create container

Click the ellipsis (…) button and select Add new container from the dropdown.

Create container

Fill in the container information parameters:

  • Name: Container name.
  • URL: Container URL.
  • Architectures: Select either x86_64 or aarch64.
  • Version: Container version.
  • GPU: Check this box if the container requires GPU resources.
  • Description (optional): Description of the application container.

Click Save to save the container.

Fill container

Configure container

Click Select container, choose the container you created, then click Add application container.

Configure container

Set the following parameters to configure the application:

  • Name: The name to be used for this container within the application. Each container name must be unique within a given application. You may use the same container multiple times within an application, but each instance must have a different name (e.g., redis-server-1, redis-server-2).
  • Order: The order of execution for the container within the application. Use integer format (e.g., 1, 2, 3).
  • Number of CPUs: The number of CPUs required for the container execution. Use integer format. If set to 0, it will run without limits or under cluster default limits.
  • Memory: The amount of memory (in megabytes) required for the container execution. Use integer format. If set to 0, it will run without limits or under cluster default limits.
  • Number of GPUs: The number of GPUs required for the container execution. If the container requires a GPU and no value is set, AAC will use 1 GPU by default. Use integer format.
  • Mount list: List of mount points in the container. Add new mounts by clicking Add mount. Volumes can have the following formats:
    • ./host/dir:/container/dir:rw. Write permissions can be read/write (rw) or read-only (ro).
      • Mount points will be under /home/aac. For example, /home/aac/host/dir.
    • tmpfs mount points can also be defined using this format: /container_dir.tmpfs:tmpfs.
      • Example: ./redis:/redis:ro, ./redis/data:/data_redis:rw, or container/dir_tmpfs:tmpfs.
      • In the case of shm memory, it is configured by default in the system.
  • Environment variables: List of environment variables to set up inside the container. Add new variables by clicking Add Environment Var.
    • Format: VAR1=value1
    • Examples: HOSTNAME=localhost, PORT=6379
    • Storage Paths: To ensure paths point to NFS (persistent) storage instead of ephemeral storage, use the format /home/aac/<path>.
  • Health check: Command to test the container’s health. If the health check fails, the container will be restarted.
    • In K8s services, if the container has more than 10 unsuccessful restart attempts, the application execution will be marked as Failed.
  • Ready check: Command to verify that the container is ready to be exposed to the client.
    • Ports are not opened until the container is ready.
  • Prerun script: Command to be executed before the run script.
  • Run script: Command to run the container. Analytics metrics are tracked over this script.
    • For batch jobs, this is a permanent command that cannot be modified for each run (see next item, Example run script).
  • Example Run script: Parameter available only for bash (non-service) applications.
    • Here you can include the command you want to execute.
  • Postrun script: Command to run in the container after the application completes execution.
    • This script is executed only in batch job applications when they are not forced to stop.

Important

Plexus does not allow modifying container run scripts at execution time, except the Extra run script for batch jobs. Plexus overwrites the default container entrypoint, working directory and command, so you must include them in the container run scripts.

After filling in the container parameters, click Save, then Next.

Examples

Interactive application

The following shows an Ubuntu SSH interactive application: configure the aac user, install SSH, and set environment variables in /etc/bash.bashrc.

Interactive application example

It includes on the prerun script the following content:

useradd aac
echo "aac:$PLEXUS_SECRET_KEY" | chpasswd
echo "USERNAME: aac"
echo "PASSWORD: $PLEXUS_SECRET_KEY"
usermod -aG sudo aac
if [ -n "$HOST_USER_ID" ] && [ -n "$HOST_GROUP_ID" ]; then
  groupmod -g $HOST_GROUP_ID aac
  usermod -u $HOST_USER_ID -g $HOST_GROUP_ID  aac
fi
usermod --shell /bin/bash aac
export -p | grep -v "declare -x HOME" >> /etc/bash.bashrc

On the runscript we install ssh daemon and keep the container busy:

echo "starting ssh daemon..."
apt update && apt-get install openssh-server -y && service ssh start
while true; do sleep 100; done

Batch application

The following shows a NAMD batch application configuration.

Batch application example

Important notes

  1. The cluster user directory will be mounted in the /home/aac folder. It will be selected as working directory in the container.
  2. Prerun script application
  3. The prerun script is used to execute common scripts for all the containers in the application. It is mainly used to move data to the right folders inside the home user workspace.
  4. The prerun script can be configured in the default prerun script application setting or it can be written in the job prerun step.
  5. Container must have different names when used in the same application.
  6. Containers configured in the application require different names.
  7. Several instances of the same container can be configured in the same application, but each container must have a different name, to be used inside the application.
  8. Multiple container applications are just available in Kubernetes
  9. The container run script must be set
  10. The run script can be either a custom script or the default entry point of the container.
  11. Get the default container entrypoint
  12. You can get default container entry point by using the following command:

    docker inspect --format="{{.Config.Entrypoint}}{{.Config.Cmd}}{{.Config.WorkingDir}}" redis

It returns entry point, command and workdir:

docker-entrypoint.sh redis-server /data

Where:

  • It returns docker-entrypoint.sh as entry point
  • It returns redis-server as command
  • It returns /data as workdir

Your runscript will need to include the entrypoint + command. If it has just one of them (either entrypoint or command) you use that one.

In some cases, the container executes a command which is inside the working directory or to the root directory /, and depends on other fields inside that folder. In that case, if you get problems with that, you can write cd /script_folder inside the prerun script container field.

Multiple containers in an application

Only service-type applications (Is service enabled) support multiple containers. For non-service applications you can add only one container.

For services, multiple instances of the same container can be configured within the same application, but each container must have a unique name to be used inside the application.

Review the application

Review the application details and make any edits. Click Create to create the application.

Connect to the container via SSH

After you submit a workload, it moves through the queue and starts when resources are available. When the status changes from Pending to Running, you can connect to it.

Open workload details

Open the workload from the dashboard. When it is Running, click the workload to open its details. On the right you will see Interactive Endpoints and Service Terminal.

Interactive Endpoints and Service Terminal

Service terminal

The Service Terminal is a websocket-based terminal. This method is not recommended for development. It is useful for quick admin or troubleshooting but may timeout or disconnect.

  • Limitations: Not suitable for long-running tasks or development.
  • Usage: Click Service Terminal to open the terminal. It may disconnect when idle.

Service Terminal

For development or a stable SSH connection, use Interactive Endpoints. You get a secure SSH connection using SSH keys. You can connect from any internet-connected machine.

Locate interactive endpoints

When the workload is Running, scroll to the Interactive Endpoints section on the right. You will see SSH Port and Secret Key.

Interactive Endpoints with SSH port and Secret Key

Click connect

Click Connect to reveal the SSH URL and Password for your container.

SSH URL and password

ssh -o strictHostKeyChecking=no -p <port> <username>@<hostname>

For example:

ssh -o strictHostKeyChecking=no -p 8003 aac@aac1.amd.com

The Password field contains the password for the initial connection.

Connect from your local terminal

  1. Open a terminal on your machine.
  2. Copy the SSH URL and paste it into your terminal.

SSH command in local terminal

Example:

ssh -o strictHostKeyChecking=no -p 8003 aac@aac1.amd.com
  1. When prompted, enter the password from the Password field.

Once connected, you can run commands, check logs, and work inside the container.