Skip to content

What's Plexus

Plexus is a web-based platform which simplifies access to and usage of the infrastructure available in AAC. It provides a ready-to-use set of applications for the wide range of AMD accelerators, including solutions for AI/ML workloads, high-performance computing, data analytics, and visualization.

Platform Capabilities

Plexus enables users to run workloads through containerized applications across different orchestration environments:

  • Slurm Clusters: Execute containerized workloads on traditional HPC infrastructure with job scheduling and resource management.
  • Kubernetes Clusters: Deploy and scale containerized applications using modern cloud-native orchestration.

The platform abstracts the complexity of container deployment and cluster management, allowing users to focus on their computational tasks while leveraging the full power of the available infrastructure and AMD accelerators.

AAC login page

Overview

Dashboard

The Dashboard serves as the central hub for users to manage their computational workloads and access platform resources. It provides a comprehensive overview of the user's activity and available tools through three main sections:

Applications Panel: Displays a curated set of pre-configured applications optimized for AMD accelerators. Users can quickly launch AI/ML frameworks (PyTorch, TensorFlow, ROCm), HPC simulation tools, data analytics platforms, and visualization software. Each application shows compatibility information, resource requirements, and quick-start options.

Recent Workloads: Shows grouped information about the recently submitted jobs and their current status (pending, running, completed, failed).

Teams: Lists all teams and projects the user belongs to, along with associated resource quotas, shared storage access, and collaborative workspaces. This section enables users to switch between different organizational contexts and view team-specific applications and compute allocations.

Dashboard page

Workloads

The Workloads page provides a comprehensive view of all submitted jobs across the platform, offering advanced monitoring and management capabilities for user workloads.

Job Listing: Displays a detailed table of all workloads with essential information including job name, submission time, execution duration, resource allocation, current status, etc. Each entry shows the target cluster (Slurm/Kubernetes) and provides quick access to job details, logs, and results.

Status Tracking: Real-time status indicators show the current state of each workload: - Pending: Jobs waiting in queue for resource allocation. - Running: Active jobs currently executing on compute nodes. - Completed: Successfully finished jobs with available results. - Failed: Jobs that encountered errors with diagnostic information. - Cancelled: User or system-terminated jobs.

Advanced Filtering: Multiple filter options allow users to efficiently navigate large job histories: - Filter by status, date range, application type, or cluster. - Search by job name. - Sort by different criteria.

Workloads page

Applications

The Applications page serves as a comprehensive catalog of all available containerized applications, providing users with detailed information and easy access to launch computational workloads.

Application Catalog: Displays a grid view of all pre-configured applications available on the platform. Each application tile shows the application name, version, description, supported AMD ROCm version, and resource requirements. Applications are categorized by domain (AI/ML, HPC, Data Analytics, Visualization) for easy navigation.

Application Details: Click on any application to view comprehensive information including: - Supported frameworks and libraries (PyTorch, TensorFlow, ROCm, OpenMP). - Hardware compatibility (GPU types, memory requirements, CPU specifications). - Pre-installed software packages and dependencies.

Filtering and Search: Advanced filtering capabilities allow users to find relevant applications: - Filter by different criteria like name, category, architecture, etc. - Search by application name or keywords. - Bookmark frequently used applications for quick access.

Applications page

Clusters

The Clusters page provides an overview of all available compute clusters, enabling users to make informed decisions about workload placement and resource utilization.

Cluster Overview: Displays a comprehensive list of all available clusters showing essential information including cluster name, type (Slurm/Kubernetes), total nodes and available resources.

Filtering and Search: Users can efficiently navigate the cluster inventory using: - Search by cluster name. - Sort by cluster name, resource manager or accelerator type. - View clusters by geographic location or organizational access.

Clusters page

Queues

The Queues page provides detailed visibility into job scheduling queues across all clusters, helping users understand queue dynamics and optimize their workload submission strategies.

Queue Overview: Displays a comprehensive list of all available queues showing essential information including queue name, associated cluster and queue limits (time, resources).

Filtering and Search: Users can efficiently navigate queue information using: - Filter by cluster, queue type, etc. - Search by queue name or associated cluster. - Sort by different criteria.

Queues page

Files

The Files page provides a comprehensive file management system that allows users to upload, organize, and manage data files required for their computational workloads across all clusters and storage systems.

File Browser: Displays a hierarchical view of user storage with folder navigation and file listing. Shows essential file information including name, size, modification date, file type, and storage location. Users can navigate through directories, create new folders, and organize files in a familiar interface similar to desktop file managers.

File Upload: Multiple upload options to accommodate different data transfer needs: - Drag & Drop: Simple interface for uploading single or multiple files - Bulk Upload: Support for large files and folder structures

File Management: Comprehensive file operations including: - View details, delete and download - Bulk deletion.

Search and Filtering: Basic file discovery tools: - Search by filename. - Filter by cluster location. - Sort by various criteria including name, size, or date creation.

Files page