Skip to content

Poiesis Architecture

Database

Poiesis uses MongoDB to store task data, instead of storing the data directly in the database, it adds extra fields and redundancy to the data to make it easier to query and analyze.

json
{
    "id": "123",
    "task_name": "test",
    "task_status": "RUNNING",
    "data": {
        "task_id": "123",
        "task_name": "test",
        "task_status": "RUNNING"
    }
}

INFO

The fields like task_name, task_status are redundant, but they are useful as they are used to filter tasks in the database.

Poiesis also then stores extra info per task such as user_id which is the unique id from OIDC provider, service_hash which is the hash of the service document when the task is created etc. To know more about the fields, please refer to the TaskSchema.

Task creation

Task creation

INFO

The above diagram shows the flow of task creation in Poiesis, its not verbatim but gives a high level overview of the process.

  1. Initialization:

    • The User submits a task request to the API.
    • The API generates a unique ID (UUID) for the task and creates a corresponding record in MongoDB (TaskDB). This database entry is the central source of information for the task state.
    • Once the task is persisted in the database, the API triggers the creation of the main Torc Job in Kubernetes.
  2. Data Preparation:

    • Torc first requests the creation of a Persistent Volume Claim (PVC) as specified by the user.
    • Torc then launches a sub-job/pod called TIF (Task Input Fetcher).
    • TIF downloads the necessary input data and mounts/places it onto the PVC.
    • Upon completion, TIF sends a message to a task-specific Redis Channel indicating that the input data is ready.
    • Torc listens to this channel and proceeds once notified.
  3. Execution:

    • Torc launches TExAM (Task Executor And Monitor).
    • TExAM is responsible for creating and launching the actual Task Executor pods (TE).
    • TExAM ensures the data from the PVC (both input and space for output) is correctly mounted into the Task Executor pods.
    • TExAM monitors the lifecycle of all Task Executor pods.
    • The Task Executor pods perform the core work, reading input from and writing output to the PVC.
    • Once all Task Executors have finished, TExAM signals completion via the Redis Channel.
    • Torc receives this notification.
  4. Data Output:

    • Torc launches the final sub-job/pod, TOF (Task Output Fetcher).
    • TOF reads the resulting output data generated by the Task Executors from the PVC.
    • TOF uploads this data to the final User Output Location specified in the initial request.
  5. Status and Logging (Ongoing):

    • Throughout the process, both Torc and TExAM periodically update the task's status and relevant logs (system logs, executor logs) in the central MongoDB (TaskDB).

Released under the Apache License 2.0.