Orchestration of Workflows in converged Cloud and HPC environments

By Georgios Saloustros June 25, 2024

Multi-Site Workflow Orchestration in the DAFAB Project

DAFAB will design and implement a workflow orchestration system that enhances multi-site application deployment and data discovery. The workflow system will enable applications to express their computations as a graph and declare the data needed at each step in a high-level query language. Workflows will then execute across multiple sites, whether cloud-based (Kubernetes) or high-performance computing (Slurm) environments. By enabling transparent data access and seamless execution of workflow stages, this system shifts the burden of application deployment and data discovery to the platform itself, significantly accelerating development timelines.

Simplifying Workflow Specification and Execution

Running applications on this advanced workflow system involves two straight-forward steps:

  1. Containerize each workflow step.
  2. Describe the workflow as a graph using a data serialization language like YAML or code such as Python.

To achieve this, the system leverages three core technologies:

  • Kubernetes (K8s) & Argo workflows: Manages the deployment and lifecycle of each workflow step.
  • Rucio Catalog: Facilitates dataset discovery and access.
  • Knot front-end: Offers a unified front-end to users for writing, executing, and monitoring workflows.

K8s-based Multi-Site Workflow Execution

The orchestration system will run each workflow step on Kubernetes (K8s) for cloud environments or Slurm for HPC sites. For the latter, the system will extend the capabilities of the HPK project from FORTH, which facilitates the execution of K8s jobs within a Slurm environment. This dual compatibility ensures that workflows can leverage the strengths of both cloud and HPC resources effectively.

Intelligent Data Management with Rucio

A key feature of the workflow system is its integration with the Rucio catalog. Applications can express the data needed through Rucio’s high-level query language. Rucio discovers and provides the location of the datasets before each workflow step. The workflow system will optimize the execution of each step based on data locality and computing resource availability.

Enhancing User Experience with Knot

FORTH has also developed Knot, a web-based environment that complements the workflow system. Knot provides users with an intuitive interface to perform actual work on Kubernetes. Through Knot, users can access landing pages to launch notebooks, design workflows, and specify execution parameters. The dashboard is a comprehensive management tool that handles user access, storage integration, service provisioning, and identity management with OAuth 2.0/OIDC-compatible applications. By integrating these technologies, the DAFAB project’s workflow orchestration system simplifies complex processes and accelerates the development and deployment of applications across diverse computing environments.

Summary & Follow-up

The DAFAB project aims to advance workflow management and execution for Earth Observation data processing. By bridging the gap between cloud and HPC environments and enhancing data accessibility and efficiency, this system promises to be a game-changer in application development and deployment.

Stay tuned for more updates as DAFAB continues to push the boundaries of what is possible in workflow orchestration! If you are interested in more details, please get in touch.

Georgios Saloustros
Institute of Computer Science (ICS)
Foundation for Research and Technology - Hellas (FORTH)