Worker: Core Concepts¶
Key Characteristics¶
- Continuous Execution: Workers are built to run perpetually, performing their assigned tasks without a defined end time.
- No Ingress: Workers do not have ingress ports like Services.
- Throughput-Based: Workers are throughput-based and do not require synchronous responses.
- Lightweight: Workers are lightweight compared to Services, as they do not require multiple open network ports. This makes them faster to deploy and more efficient.
- Specialized Execution: Worker is a self-contained system, an independent entity, ideal for executing specific tasks within a larger application, providing focused functionality.
- Autoscalability: Workers can be autoscaled to handle larger workloads, making them highly adaptable.
- Robustness: Workers are perfect for use cases where robustness and continuous execution are essential.
Workflow, Service, and Worker: Key Differences¶
Workflow, Service, and Worker are distinct runnable DataOS Resources, each with unique roles in the ecosystem. Data developers often face the dilemma of deciding when to use a Workflow, a Service, or a Worker in the DataOS environment. To aid in this decision-making process, the following table compares Workflow, Service, and Worker comprehensively, helping developers understand their distinct characteristics and optimal use cases within the DataOS ecosystem.
Characteristic | Workflow | Service | Worker |
---|---|---|---|
Overview | Workflows orchestrate sequences of tasks, jobs, or processes, terminating upon successful completion or failure. | Services are long-running processes that continuously operate, serve, and process API requests. | Workers execute specific tasks or computations continuously without a defined end time. |
Execution Model | Workflows process data in discrete chunks, following predefined DAGs (Directed Acyclic Graphs). | Services expose API endpoints and ingress ports for external data or request intake. They don’t have DAGs. | Workers perform continuous task execution independently, without synchronous inputs or ingress ports. |
Data Dependency | Workflows follow predefined orders or DAGs, depending on data input sequences. | Services rely on incoming data through ingress ports for logic execution. | Workers are throughput-based and do not require synchronous inputs or ingress ports. |
Stack Orchestration | Yes | Yes | Yes |
Scope | Workspace-level | Workspace-level | Workspace-level |
Use Cases | 1. Batch Data Processing Pipelines: Ideal for orchestrating complex data processing pipelines. 2. Scheduled Jobs: Perfect for automating tasks at specific intervals, such as data backups and ETL processes. |
1. API Endpoints: Used to create API endpoints for various purposes, such as data retrieval and interaction with external systems. 2. User Interfaces: Suitable for building interfaces that interact with data or services. |
1. Continuous Processing: Perfect for tasks like real-time analytics, and event-driven operations. 2. Independence: Ideal for creating independent systems that perform specific tasks indefinitely. |