Skip to content

Plexus

Plexus is a web-based platform that simplifies access to and usage of the infrastructure available in AMD Accelerator Cloud (AAC). It provides a ready-to-use set of applications for a wide range of AMD accelerators, including solutions for AI/ML workloads, high-performance computing, data analytics, and visualization.

Platform capabilities

You can run workloads through containerized applications across different orchestration environments:

  • Slurm Clusters: Execute containerized workloads on traditional HPC infrastructure with job scheduling and resource management.
  • Kubernetes Clusters: Deploy and scale containerized applications using modern cloud-native orchestration.

The platform abstracts the complexity of container deployment and cluster management so you can focus on your computational tasks and leverage the full power of the available infrastructure and AMD accelerators. To sign in, see Sign in to AAC.

Overview

Dashboard

The Dashboard is the central hub for managing your workloads and accessing platform resources. It gives you an overview of your activity and available tools through three main sections:

Applications Panel: A curated set of pre-configured applications optimized for AMD accelerators. You can launch AI/ML frameworks (PyTorch, TensorFlow, ROCm), HPC simulation tools, data analytics platforms, and visualization software. Each application shows compatibility information, resource requirements, and quick-start options.

Recent Workloads: Grouped information about your recently submitted jobs and their current status (pending, running, completed, failed).

Teams: All teams and projects you belong to, along with resource quotas, shared storage access, and collaborative workspaces. You can switch between organizational contexts and view team-specific applications and compute allocations.

Dashboard page

Workloads

The Workloads page shows all your submitted jobs and offers monitoring and management capabilities.

Job Listing: Displays a detailed table of all workloads with essential information including job name, submission time, execution duration, resource allocation, current status, etc. Each entry shows the target cluster (Slurm/Kubernetes) and provides quick access to job details, logs, and results.

Status Tracking: Real-time status indicators show the current state of each workload: - Pending: Jobs waiting in queue for resource allocation. - Running: Active jobs currently executing on compute nodes. - Completed: Successfully finished jobs with available results. - Failed: Jobs that encountered errors with diagnostic information. - Cancelled: User or system-terminated jobs.

Advanced Filtering: You can filter and search to navigate large job histories: - Filter by status, date range, application type, or cluster. - Search by job name. - Sort by different criteria.

Workloads page

Applications

The Applications page is a catalog of all available containerized applications, with detailed information and quick access to launch workloads.

Application Catalog: Displays a grid view of all pre-configured applications available on the platform. Each application tile shows the application name, version, description, supported AMD ROCm version, and resource requirements. Applications are categorized by domain (AI/ML, HPC, Data Analytics, Visualization) for easy navigation.

Application Details: Click on any application to view comprehensive information including: - Supported frameworks and libraries (PyTorch, TensorFlow, ROCm, OpenMP). - Hardware compatibility (GPU types, memory requirements, CPU specifications). - Pre-installed software packages and dependencies.

Filtering and Search: You can filter and search to find relevant applications: - Filter by different criteria like name, category, architecture, etc. - Search by application name or keywords. - Bookmark frequently used applications for quick access.

Applications page

Clusters

The Clusters page lists all available compute clusters so you can choose where to run your workloads.

Cluster Overview: Displays a comprehensive list of all available clusters showing essential information including cluster name, type (Slurm/Kubernetes), total nodes and available resources.

Filtering and Search: You can navigate the cluster inventory by: - Search by cluster name. - Sort by cluster name, resource manager or accelerator type. - View clusters by geographic location or organizational access.

Clusters page

Queues

The Queues page shows job scheduling queues across all clusters so you can understand queue limits and plan your workload submissions.

Queue Overview: Displays a comprehensive list of all available queues showing essential information including queue name, associated cluster and queue limits (time, resources).

Filtering and Search: You can navigate queue information by: - Filter by cluster, queue type, etc. - Search by queue name or associated cluster. - Sort by different criteria.

Queues page

Files

The Files page is where you upload, organize, and manage data files for your workloads across all clusters and storage.

File Browser: A hierarchical view of your storage with folder navigation and file listing (name, size, modification date, file type, storage location). You can navigate directories, create folders, and organize files in a familiar interface.

File Upload: Multiple upload options to accommodate different data transfer needs: - Drag & Drop: Simple interface for uploading single or multiple files - Bulk Upload: Support for large files and folder structures

File Management: Comprehensive file operations including: - View details, delete and download - Bulk deletion.

Search and Filtering: Basic file discovery tools: - Search by filename. - Filter by cluster location. - Sort by various criteria including name, size, or date creation.

Files page