Skip to content

Worker

A Worker Resource in DataOS is a long-running process responsible for performing specific tasks or computations indefinitely. To understand the key characteristics and what differentiates a Worker from a Workflow and a Service, refer to the following link: Core Concepts.

Worker in the Data Product Lifecycle

Worker Resources are integral to the build phase in the Data Product Lifecycle. It forms part of the 'code' component in the data product definition and is essential for carrying out prolonged transformations. They are particularly useful when your transformation involves:

  • Indefinite Execution: Continuously processing or transforming stream or batch data without a defined endpoint. For example, a Worker processing live sensor data from IoT devices and storing it in a dataset.
  • Child/ Processes: Creating child processes for a main process, allowing for modular and scalable task execution. Employing a Worker to handle background jobs in a web application.
  • Independent Processing: Performing long-running transformations without requiring external network communication. Example, employing a Worker to continuously monitor independent data streams.

Worker overview

Worker Resource in DataOS

Structure of Worker manifest

Worker manifest

worker_manifest_structure.yml
# RESOURCE META SECTION
# Attributes commmon across all DataOS Resources
name: benthos3-worker-sample # Name of the Worker Resource (mandatory)
version: v1beta # Version of the Worker Resource (mandatory)
type: worker # Type of the resource, in this case, it is a worker (optional)
tags:
    - worker 
    - dataos:type:resource 
    - dataos:resource:worker 
    - dataos:layer:user 
    - dataos:workspace:public 
description: Random User Console # Description of the Worker Resource (optional)

# WORKER-SPECIFIC SECTION
# Attributes specific to Worker resource-type
worker: 
    tags: # Worker-specific tags
      - worker 
    replicas: 1 # Specifies the number of worker replicas to run
    stack: benthos-worker:3.0 # Specifies the stack name and version for the worker
    logLevel: DEBUG # Sets the logging level to DEBUG
    compute: runnable-default # Defines the Compute Resource to be used
    resources:
        requests:
            cpu: 100m # Requests 100 millicores of CPU
            memory: 128Mi # Requests 128 MiB of memory
        limits:
            cpu: 1000m # Limits the worker to 1000 millicores of CPU
            memory: 1024Mi # Limits the worker to 1024 MiB of memory

# STACK-SPECIFIC SECTION
# Attributes specific to the choosen Stack
    stackSpec:
        input:
            http_client:
            headers:
                Content-Type: application/octet-stream # Sets the content type header
            url: https://randomuser.me/api/ # URL to fetch data from
            verb: GET # HTTP method to use (GET)
        output:
            stdout:
            codec: |
                delim:
                -----------GOOD------------ # Example delimiter

First Steps

Worker Resource in DataOS can be created by applying the manifest file using the DataOS CLI. To learn more about this process, navigate to the link: First steps.

Configuration

Workers can be configured to autoscale and match varying workload demands, reference pre-defined Secrets and Volumes, and more. The specific configurations may vary depending on the use case. For a detailed breakdown of the configuration options and attributes, please refer to the documentation: Attributes of Worker manifest.