Attributes of Workflow YAML Configuration¶
Structure of a Workflow manifest¶
name: ${resource_name} # Name of the Resource (mandatory)
version: v1beta # Manifest version of the Resource (mandatory)
type: worker # Type of Resource (mandatory)
tags: # Tags for categorizing the Resource (optional)
- ${tag_example_1}
- ${tag_example_2}
description: ${resource_description} # Description (optional)
owner: ${resource_owner} # Owner of the Resource (optional, default value: user-id of user deploying the resource)
layer: ${resource_layer} # DataOS Layer (optional, default value: user)
workflow:
title: ${title of workflow}
schedule:
cron: ${*/10 * * * *}
concurrencyPolicy: ${Allow}
endOn: ${2022-01-01T23:40:45Z}
timezone: ${Asia/Kolkata}
dag:
- name: ${job1-name}
description: ${description}
title: ${title of job}
tags:
- ${tag1}
- ${tag2}
gcWhenComplete: true
spec:
stack: ${flare:5.0}
logLevel: ${INFO}
configs:
${alpha: beta}
envs:
${random: delta}
secrets:
- ${mysecret}
dataosSecrets:
- name: ${mysecret}
workspace: ${curriculum}
key: ${newone}
keys:
- ${newone}
- ${oldone}
allKeys: ${true}
consumptionType: ${envVars}
dataosVolumes:
- name: ${myVolume}
directory: ${/file}
readOnly: ${true}
subPath: ${/random}
tempVolume: ${abcd}
persistentVolume:
name: ${myVolume}
directory: ${/file}
readOnly: ${true}
subPath: ${/random}
compute: ${compute resource name}
requests:
cpu: ${100Mi}
memory: ${100Gi}
limits:
cpu: ${100Mi}
memory: ${100Gi}
dryRun: ${true}
runAsApiKey: ${abcdefghijklmnopqrstuvwxyz}
runAsUser: ${iamgroot}
topology:
name: ${abcd}
type: ${efgh}
doc: ${abcd efgh}
properties:
${alpha: random}
dependencies:
- ${abc}
file: ${abcd}
retry:
count: ${2}
strategy: ${"OnTransientError"}
duration: <string>
maxDuration: <string>
Configuration¶
Resource meta section¶
This section serves as the header of the manifest file, defining the overall characteristics of the Worker Resource you wish to create. It includes attributes common to all types of Resources in DataOS. These attributes help DataOS in identifying, categorizing, and managing the Resource within its ecosystem. To learn about the Resources of this section, refer to the following link: Attributes of Resource meta section.
Workflow-specific Section¶
This section comprises attributes specific to the Workflow Resource. The attributes within the section are listed below:
workflow
¶
Description: workflow section
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
mapping | mandatory | none | none |
Example Usage:
title
¶
Description: Title of Workflow
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | optional | none | any string |
Example Usage:
schedule
¶
Description: schedule section
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
mapping | optional (mandatory for Scheduled Workflows) |
none | none |
Example Usage:
cron
¶
Description: the cron field encompasses the cron expression, a string that comprises six or seven sub-expressions providing specific details of the schedule.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | optional (mandatory for Scheduled Workflows) |
none | any valid cron expression |
Additional Details: the cron expression consists of value separated by white spaces, make sure there are no formatting issues.
Example Usage:
concurrencyPolicy
¶
Description: the concurrencyPolicy
attribute determines how concurrent executions of a Workflow, created by a scheduled Workflow, are handled
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | optional | Allow | Allow/Forbid/Replace |
Additional Details:
-
concurrencyPolicy: Forbid
- When theconcurrencyPolicy
is set toForbid
, the Schedule/Cron Workflow strictly prohibits concurrent runs. In this scenario, if it is time for a new Workflow run and the previous Workflow run is still in progress, the cron Workflow will skip the new Workflow run altogether. -
concurrencyPolicy: Allow
- On the other hand, setting theconcurrencyPolicy
toAllow
enables the Schedule/Cron Workflow to accommodate concurrent executions. If it is time for a new Workflow run and the previous Workflow run has not completed yet, the cron Workflow will proceed with the new Workflow run concurrently. -
concurrencyPolicy: Replace
- When theconcurrencyPolicy
is set toReplace
, the Schedule/Cron Workflow handles concurrent executions by replacing the currently running Workflow run with a new Workflow run if it is time for the next job Workflow and the previous one is still in progress.
Example Usage:
endOn
¶
Description: endOn
terminates the scheduled Workflow run at the specified time, even if the last workflow run that got triggered before the threshold time isn’t complete
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | optional | none | any time provided in ISO 8601 format |
Example Usage:
The timestamp 2022-01-01T23:30:45Z follows the ISO 8601 format:
- Date: 2022-01-01 (
YYYY-MM-DD
) - T: Separator indicating the start of the time portion in the datetime string.
- Time: 23:30:45 (
hh:mm:ss
) - Timezone: Z (UTC)
- Z: Indicates the time is in Coordinated Universal Time (UTC), also known as Zulu time.
It represents January 1, 2022, at 23:30:45 UTC
.
timezone
¶
Description: Time zone for scheduling the workflow.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | mandatory | none | Asia/Kolkata, America/Los_Angeles, etc |
Example Usage:
dag
¶
Description: DAG is a Directed Acyclic Graph, a conceptual representation of a sequence of jobs (or activities). These jobs in a DAG are executed in the order of dependencies between them.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
mapping | mandatory | none | none |
Additional Details: there should be atleast one job within a DAG
Example Usage:
dag:
- name: profiling-job
spec:
stack: flare:5.0
compute: runnable-default
stackSpec:
{} # Flare Stack-specific attributes
name
¶
Description: name of the Job
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | mandatory | none | any string confirming the regex[a-z0-9]([-a-z0-9]*[a-z0-9]) and length less than or equal to 48 |
Example Usage:
title
¶
Description: title of Job
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | optional | none | any string |
Example Usage:
description
¶
Description: text describing the Job
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | optional | none | any string |
Example Usage:
tags
¶
Description: tags associated with the Workflow.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
list of strings | optional | none | valid tags |
Example Usage:
gcWhenComplete
¶
Description: tags associated with the Workflow.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
list of strings | optional | none | valid tags |
Example Usage:
spec
¶
Description: Specs of the Job.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
mapping | mandatory | none | none |
Example Usage:
spec:
stack: flare:5.0
compute: runnable-default
stackSpec:
{} # Flare Stack specific configurations
stack
¶
Description: The name and version of the Stack Resource which the Workflow orchestrates.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | mandatory | none | flare/toolBox/scanner/dataos-ctl/soda+python/steampipestack |
Additional Details: To know more about each stack, go to Stack.
Example Usage:
logLevel
¶
Description: The log level for the Service classifies entries in logs in terms of urgency which helps to filter logs during search and helps control the amount of information in logs.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | optional | INFO | INFO, WARN, DEBUG, ERROR |
Additional Details:
-
INFO
: Designates informational messages that highlight the progress of the service. -
WARN
: Designates potentially harmful situations. -
DEBUG
: Designates fine-grained informational events that are most useful while debugging. -
ERROR
: Designates error events that might still allow the workflow to continue running.
Example Usage:
configs
¶
Description: additional optional configuration for the service.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
mapping | optional | none | key-value configurations |
Example Usage:
envs
¶
Description: environment variables for the Workflow.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
mapping | optional | none | key-value configurations |
Example Usage:
envs:
DEPOT_SERVICE_URL: http://depotservice-api.depot.svc.cluster.local:8000/ds/
HTTP_CONNECT_TIMEOUT_MS: 60000
HTTP_SOCKET_TIMEOUT_MS: 60000
Additional Details:
- DEPOT_SERVICE_URL: Specifies the base URL for the Depot Service API. This is the endpoint that the service interacts with for managing Depots.
- HTTP_CONNECT_TIMEOUT_MS: Defines the connection timeout for HTTP requests, in milliseconds. If a connection to a remote server cannot be established within this timeframe (60 seconds in this case), the request will timeout. This ensures that the workload does not hang indefinitely while attempting to connect.
- HTTP_SOCKET_TIMEOUT_MS: Sets the socket timeout for HTTP requests, in milliseconds. This controls the maximum time that the service will wait for data after a connection has been established. If data is not received from the connected server within this period (60 seconds), the request will timeout. This helps prevent long delays in response handling when waiting for data transfer.
secrets
¶
Description: list of secrets associated with the Workflow.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
list of strings | optional | none | none |
Example Usage:
dataosSecrets
¶
Description: list of DataOS Secrets associated with the Workflow. Each DataOS Secret is a mapping containing various attributes.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
list of mappings | optional | none | none |
Example Usage:
workflow:
dataosSecrets:
- name: mysecret
workspace: curriculum
key: newone
keys:
- newone
- oldone
allKeys: true
consumptionType: envVars
dataosVolumes
¶
Description: list of DataOS Volumes associated with the Workflow. Each DataOS Volume is a mapping containing various attributes.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
list of mappings | optional | none | none |
Example Usage:
tempVolume
¶
Description: The temporary volume of the Workflow.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | optional | none | any valid Volume name |
Example Usage:
persistentVolume
¶
Description: configuration for the persistent volume associated with the Workflow.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
mapping | optional | none | none |
Example Usage:
compute
¶
Description: the name of the Compute Resource for the Workflow.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | mandatory | none | valid runnable-type Compute Resource name. |
Example Usage:
resources
¶
Description: Resource requests and limits for the Workflow. This includes CPU and memory specifications.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
mapping | optional | none | none |
Example Usage:
dryRun
¶
Description: Indicates whether the workflow is in dry run mode. When enabled, the dryRun property deploys the Workflow to the cluster without submitting it.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
boolean | optional | true | true or false. |
Example Usage:
runAsUser
¶
Description: when the runAsUser
attribute is configured with the UserID of the use-case assignee, it grants the authority to perform operations on behalf of that user.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | optional | none | userID of the Use Case Assignee |
Example Usage:
runAsApiKey
¶
Description: The runAsApiKey attribute allows a user to assume another user's identity by providing the latter's API key.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | mandatory | none | any valid API key. |
Additional Details: The apikey can be obtained by executing the following command from the CLI:
dataos-ctl user apikey get
In case no apikey is available, the below command can be run to create a new apikey
dataos-ctl user apikey create -n ${{name of the apikey}} -d ${{duration for the apikey to live}}
Example Usage:
topology
¶
Description: The topology
attribute is used to define the topology of the Workflow. It specifies the elements and dependencies within the Workflow's topology.
Data Type | Requirement | Default Value | Possible Values |
---|---|---|---|
list of mappings | mandatory | none | list of topology element definitions |
Example Usage:
topology:
- name: random # mandatory
type: alpha # mandatory
doc: new # Documentation for the element
properties:
random: lost # Custom properties for the element
dependencies:
- new1
- new2
file
¶
Description: attribute for specifying the file path for a Workflow YAML
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | optional | none | none |
Example Usage:
retry
¶
Description: retrying failed jobs
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
mapping | optional | none | none |
Example Usage:
count
¶
Description: count post which retry occurs
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
integer | optional | none | any positive integer |
Example Usage:
strategy
¶
Description: strategies to choose which job failures to retry
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | optional | none | Always/OnFailure/ OnError/OnTransientError |
Additional Details:
- Always
- Retry all failed steps.
- OnFailure
- Retry steps whose main container is marked as failed in Kubernetes (this is the default).
- OnError
- Retry steps that encounter errors or whose init or wait containers fail.
- OnTransientError
- Retry steps that encounter errors defined as transient or errors matching the TRANSIENT_ERROR_PATTERN
environment variable.
Example Usage:
dependencies
¶
Description: specifies the dependency between jobs/Workflows
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | optional | none | none |
Example Usage: