Worker¶
A Worker Resource in DataOS is a long-running process responsible for performing specific tasks or computations indefinitely.
Key Characteristics¶
- Continuous Execution: Workers are built to run perpetually, performing their assigned tasks without a defined end time.
- No Ingress: Workers do not have ingress ports like Services.
- Throughput-Based: Workers are throughput-based and do not require synchronous responses.
- Lightweight: Workers are lightweight compared to Services, as they do not require multiple open network ports. This makes them faster to deploy and more efficient.
- Specialized Execution: Worker is a self-contained system, an independent entity, ideal for executing specific tasks within a larger application, providing focused functionality.
- Autoscalability: Workers can be autoscaled to handle larger workloads, making them highly adaptable.
- Robustness: Workers are perfect for use cases where robustness and continuous execution are essential.
Workflow vs. Service vs. Worker¶
Workflow, Service, and Worker are distinct DataOS Resources, each with unique roles in the ecosystem. Data developers often face the dilemma of deciding when to use a Workflow, a Service, or a Worker in the DataOS environment. To aid in this decision-making process, the following table compares Workflow, Service, and Worker comprehensively, helping developers understand their distinct characteristics and optimal use cases within the DataOS ecosystem.
Characteristic | Workflow | Service | Worker |
---|---|---|---|
Overview | Workflows orchestrate sequences of tasks, jobs, or processes, terminating upon successful completion or failure. | Services are long-running processes that continuously operate, serve, and process API requests. | Workers execute specific tasks or computations continuously without a defined end time. |
Execution Model | Workflows process data in discrete chunks, following predefined DAGs (Directed Acyclic Graphs). | Services expose API endpoints and ingress ports for external data or request intake. They donβt have DAGs. | Workers perform continuous task execution independently, without synchronous inputs or ingress ports. |
Data Dependency | Workflows follow predefined orders or DAGs, depending on data input sequences. | Services rely on incoming data through ingress ports for logic execution. | Workers are throughput-based and do not require synchronous inputs or ingress ports. |
Stack Orchestration | Yes | Yes | Yes |
Scope | Workspace-level | Workspace-level | Workspace-level |
Use Cases | 1. Batch Data Processing Pipelines: Ideal for orchestrating complex data processing pipelines. 2. Scheduled Jobs: Perfect for automating tasks at specific intervals, such as data backups and ETL processes. |
1. API Endpoints: Used to create API endpoints for various purposes, such as data retrieval and interaction with external systems. 2. User Interfaces: Suitable for building interfaces that interact with data or services. |
1. Continuous Processing: Perfect for tasks like real-time analytics, and event-driven operations. 2. Independence: Ideal for creating independent systems that perform specific tasks indefinitely. |
How to create a Worker?¶
Data developers can create a Worker Resource using a YAML configuration file via the DataOS CLI.
Worker YAML configuration¶
A Worker Resource YAML configuration consists of two distinct sections:
- Resource meta Section: This section comprises attributes that are shared among all Resource-types.
- Worker-specific Section: The Worker-specific section contains attributes unique to the Worker Resource.
The configuration of each of these sections is provided in detail below.
Configuring the Resource meta section¶
In DataOS, a Worker is categorized as a Resource-type. The YAML configuration file for a Worker Resource includes a Resource meta section, which encompasses attributes shared among all Resource-types.
The following YAML excerpt illustrates the attributes that are specified within this section:
name: ${{my-worker}}
version: v1beta
type: worker
layer: user
tags:
- ${{dataos:type:resource}}
- ${{dataos:resource:worker}}
description: ${{this worker resource is for a data product}}
owner: ${{iamgroot}}
worker: # worker-specific section
${{worker-specific Attributes}}
For additional information about the attributes within the Resource meta section, please consult the Attributes of Resource meta section.
Configuring Worker-specific section¶
The below YAML provides a high-level structure for the Worker-specific section:
worker:
title: ${{title of worker}}
tags:
- ${{tag 1}}
- ${{tag 2}}
replicas: ${{worker replicas}}
autoscaling:
enabled: ${{enable autoscaling}}
minReplicas: ${{minimum replicas}}
maxReplicas: ${{maximum replicas}}
targetMemoryUtilizationPercentage: ${{60}}
targetCPUUtilizationPercentage: ${{70}}
stack: ${{stack name and version}}
logLevel: ${{log level}}
configs:
${{additional configuration}}
envs:
${{environment variable configuration}}
secrets:
- ${{secret configuration}}
dataosSecrets:
${{dataos secret resource configuration}}
dataosVolumes:
${{dataos volumes resource configuration}}
tempVolume: hola
persistentVolume:
${{persistent volume configuration}}
compute: runnable-default
resources:
requests:
cpu: ${{cpu requests}}
memory: ${{memory requests}}
limits:
cpu: ${{cpu limits}}
memory: ${{memory limits}}
dryRun: ${{enables dryrun}}
runAsApiKey: ${{dataos apikey}}
runAsUser: ${{dataos user-id}}
topology:
${{worker topology}}
stackSpec:
${{Stack-specific Attributes}}
Here's a summary of the attributes within the Worker-specific section:
Attribute | Data Type | Default Value | Possible Values | Requirement |
---|---|---|---|---|
worker |
mapping | none | none | mandatory |
title |
string | none | any valid string | optional |
tags |
list of strings | none | list of valid strings | optional |
replicas |
integer | 1 | any positive integer | optional |
autoscaling |
mapping | none | valid autoscaling configuration | optional |
stack |
string | none | valid stack name | mandatory |
logLevel |
string | info | INFO, DEBUG, WARN, ERROR | optional |
configs |
mapping | none | valid custom configurations in key-value format | optional |
envs |
mapping | none | valid environment variable definitions | optional |
secrets |
list of secrets | none | list of secret definitions | optional |
dataosSecrets |
list of mappings | none | list of DataOS Secret Resource definitions | optional |
dataosVolumes |
list of mappings | none | list of DataOS Volume Resource definitions | optional |
tempVolume |
string | none | valid volume name | optional |
persistentVolume |
mapping | none | valid persistent volume definition | optional |
compute |
string | none | runnable-default or any other custom Compute Resource name | mandatory |
resources |
mapping | none | valid CPU and memory resource requests and limits | optional |
dryRun |
boolean | false | true or false | optional |
runAsApiKey |
string | none | valid DataOS API key | optional |
runAsUser |
string | none | valid user identity | optional |
topology |
list of mappings | none | list of topology element definitions | mandatory |
stackSpec |
mapping | none | valid stack-specific attributes | optional |
By configuring these sections as needed, data developers can create highly customizable Worker Resources. For a detailed explanation of the attributes within the Worker-specific section, you can refer to the Attributes of Worker-specific section.
Apply the Worker YAML¶
After creating the YAML configuration file for the Worker Resource, it's time to apply it to instantiate the Resource-instance in the DataOS environment. To apply the Worker YAML file, utilize theΒ apply
Β command.
dataos-ctl apply -f ${{yaml config file path}} - w ${{workspace name}}
# Sample
dataos-ctl apply -f dataproducts/new-worker.yaml -w curriculum
Sample Worker YAML
name: benthos3-worker-sample
version: v1beta
type: worker
tags:
- worker
- dataos:type:resource
- dataos:resource:worker
- dataos:layer:user
- dataos:workspace:public
description: Random User Console
worker:
tags:
- worker
replicas: 1
stack: benthos-worker:3.0
logLevel: DEBUG
compute: runnable-default
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 1000m
memory: 1024Mi
stackSpec:
input:
http_client:
headers:
Content-Type: application/octet-stream
url: https://randomuser.me/api/
verb: GET
output:
stdout:
codec: |
delim:
-----------GOOD------------
Verify Worker Creation¶
To ensure that your Worker has been successfully created, you can verify it in two ways:
Check the name of the newly created Worker in the list of workers created by you in a particular Workspace:
Alternatively, retrieve the list of all Workers created in the Workspace by appending -a
flag:
You can also access the details of any created Worker through the DataOS GUI in the Resource tab of the Β Operations App.
Deleting a Worker¶
Use the delete
command to remove the specific Worker Resource-instance from the DataOS environment. There are three ways to delete a Worker as shown below.
Method 1: Copy the name to Workspace from the output table of the get
command and use it as a string in the delete command.
Command
Example:
Output:
INFO[0000] π delete...
INFO[0001] π deleting(public) cnt-product-demo-01:v1beta:worker...
INFO[0003] π deleting(public) cnt-product-demo-01:v1beta:worker...deleted
INFO[0003] π delete...complete
Method 2: Specify the path of the YAML file and use the delete
command.
Command:
Example:
Output:
INFO[0000] π delete...
INFO[0000] π deleting(public) cnt-city-demo-010:v1beta:worker...
INFO[0001] π deleting(public) cnt-city-demo-010:v1beta:worker...deleted
INFO[0001] π delete...complete
Method 3: Specify the Workspace, Resource-type, and Worker name in the delete
command.
Command:
Example:
Output:
INFO[0000] π delete...
INFO[0000] π deleting(public) cnt-city-demo-010:v1beta:worker...
INFO[0001] π deleting(public) cnt-city-demo-010:v1beta:worker...deleted
INFO[0001] π delete...complete
Attributes of Worker YAML¶
The Attributes of Worker YAML define the key properties and configurations that can be used to specify and customize Worker Resources within a YAML file. These attributes allow data developers to define the structure and behavior of their Worker Resources. For comprehensive information on each attribute and its usage, please refer to the link: Attributes of Worker YAML.
Worker Templates¶
The Worker templates serve as blueprints, defining the structure and configurations for various Workers. To know more, refer to the link: Worker Templates.
Worker Command Reference¶
Here is a reference to the various commands related to managing Workers in DataOS:
-
Applying a Worker: Use the following command to apply a Worker using a YAML configuration file:
-
Get Worker Status: To retrieve the status of a specific Worker, use the following command:
-
Get Status of all Workers within a Workspace: To get the status of all Workers within the current context, use this command:
-
Generate Worker JSON Schema: To generate the JSON schema for a Worker with a specific version (e.g., v1alpha), use the following command:
-
Get Worker JSON Resource Schema: To obtain the JSON resource schema for a Worker with a specific version (e.g., v1alpha), use the following command:
-
Delete Workers: To delete a specific worker you can use the below command