Apply a Workflow and get runtime status using CLI Stack¶
The dataos-ctl stack called CLI Stack can be orchestrated using a Workflow Resource, where each job executes the command once and concludes the process upon completion. This plays a pivotal role in enabling Continuous Integration and Continuous Deployment (CI/CD) workflows that integrate multiple CLI commands, creating a cohesive and automated deployment process.
How to use CLI Stack?¶
Utilizing CLI Stack involves a series of logical steps, as outlined below:
- Create an Instance Secret manifest
- Apply the Instance Secret manifest
- Create a Workflow manifest
- Apply the Workflow manifest
- Verify Workflow creation
- Check Workflow Logs to validate execution
Create an Instance Secret manifest¶
To execute a resource using this stack, users need to provide their API key and User ID. This information can be supplied using the an Instance secret. First create an instance secret Resource and then refer this secret within the Workflow Resource.
To fetch the details about the User ID and User API Key token, execute the following commands after logging into DataOS:
To fetch the details about the User ID
dataos-ctl user get
# Sample Output
INFO[0000] π user get...
INFO[0000] π user get...complete
NAME β ID β TYPE β EMAIL β TAGS
ββββββββββββββββΌββββββββββββββΌβββββββββΌβββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββ
IamGroot β iamgroot β person β iamgroot@tmdc.io β roles:id:data-dev,
β β β β roles:id:operator,
β # user_id β β β roles:id:system-dev,
β β β β roles:id:user,
β β β β users:id:iamgroot
For User API key token, if apikey token already exists execute command:
dataos-ctl user apikey get
#Expected Output
| TOKEN β TYPE β EXPIRATION β NAME
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
jjjjjk5lX21hbGFtdXRlLmI4ZjRlNzc2LTYyNTAtNGI4MC05YTZhLTMwMzI3N2Y3Y2JhZQ== β apikey β 2024-06-18T19:30:00+05:30 β token_newly_entirely_divine_malamute
dG9rZW5faGFyZGx5X3BoLmI3NTcwZmFjLWZlNmEtNDE4NC1iNjA3LTc5MjM1ODVlNDQxYQ== β apikey β 2024-06-28T05:30:00+05:30 β token_hardly_physically_model_maggot
kkloZWxpY2F0ZV9zaGVlcC4xNmRkOTYwOS1mMjRhLTRiMWEtYTc0ZC0OTJjOTExYjE0ZTQ== β apikey β 2024-06-28T05:30:00+05:30 β token_grossly_socially_delicate_sheep
If no apikey token exists, create a new one using the following command:
dataos-ctl user apikey create
# Sample Output
INFO[0000] π user apikey get...
INFO[0000] π user apikey get...complete
TOKEN β TYPE β EXPIRATION β NAME
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
abcdefghijklmnopqrstuvcdefghijklmnop β apikey β 2023-12-29T14:00:00Z β token_abcd
# dataos_user_apikey_token
Replace ${dataos_user_id} and ${dataos_user_apikey_token} with values obtained from the commands above in the Secret Manifest provided below:
Apply the Instance Secret manifest¶
Create a Workflow manifest¶
The DataOS CLI Stack can be orchestrated using the Resource-type Workflow.
The Sample YAML for a Workflow that creates a Volume, checks its status, and deletes it is provided below:
# Resource meta section
name: dataos-ctl-workflow-lifecycle-02
version: v1
type: workflow
# Workflow-specific section
workflow:
dag:
# First Job
- name: create-workflow
spec:
stack: dataos-ctl # dataos-ctl stack name
compute: runnable-default
# Referred Instance secrets
dataosSecrets:
- name: dataos-ctl-user-apikey # Instance secret name same as declared above
allKeys: true
consumptionType: envVars
# Stack-specific section
stackSpec:
arguments:
- resource
- apply
- -f
- /etc/dataos/config/manifest.yaml
- -w
- public
# Manifest for the Resource against which the above command is executed
manifest:
# Resource Section
#This is a lakehouse to lakehouse workflow hence it's way of giving input and output is diff.
version: v1
name: wf-tmdc-01
type: workflow
tags:
- Connect
- City
description: The job ingests city data from dropzone into raw zone
# Workflow-specific Section
workflow:
title: Connect City
dag:
# Job 1 Specific Section
- name: read-write-lakehouse # Job 1 name
title: City Dimension Ingester
description: The job ingests city data from dropzone into raw zone
spec:
tags:
- Connect
- City
stack: flare:7.0 # The job gets executed upon the Flare Stack, so its a Flare Job
compute: runnable-default
# Flare Stack-specific Section
stackSpec:
driver:
coreLimit: 1100m
cores: 1
memory: 1048m
job:
explain: true #job section will contain explain, log-level, inputs, outputs and steps
logLevel: INFO
inputs:
- name: city_connect
query: SELECT
*,
date_format (NOW(), 'yyyyMMddHHmm') AS version1,
NOW() AS ts_city1
FROM
lakehouse.retail.city
# dataset: dataos://lakehouse:retail/city
# format: Iceberg
options:
SSL: "true"
driver: "io.trino.jdbc.TrinoDriver"
cluster: "system"
# schemaPath: dataos://thirdparty01:none/schemas/avsc/city.avsc #schema path is not necessary for lakehouse to lakehouse
outputs:
- name: city_connect
dataset: dataos://lakehouse:retail/city10?acl=rw
format: Iceberg
description: City data ingested from retail city
tags:
- retail
- city
options:
saveMode: overwrite
# Second Job
- name: get-workflow
spec:
stack: dataos-ctl
compute: runnable-default
# Referred Instance secrets
dataosSecrets:
- name: dataos-ctl-user-apikey
allKeys: true
consumptionType: envVars
# Stack-specific section
stackSpec:
arguments:
- resource
- get
- -t
- workflow
- -n
- temp001
- -w
- public
dependencies:
- create-workflow # Second Job dependent on successful execution of First Job
Apply the Workflow manifest¶
dataos-ctl apply -f ${workflow yaml file path} -w ${workspace name}
# Sample and Expected Output
dataos-ctl apply -f workflow/volume_lifecycle.yml
INFO[0000] π apply...
INFO[0000] π§ applying(public) dataos-ctl-workflow-lifecycle-02:v1:workflow...
INFO[0003] π§ applying(public) dataos-ctl-workflow-lifecycle-02:v1:workflow...created
INFO[0003] π apply...complete
Verify Workflow creation¶
dataos-ctl get -t workflow -w ${workspace name}
# Sample Output
INFO[0000] π get...
INFO[0001] π get...complete
NAME | VERSION | TYPE | WORKSPACE | STATUS | RUNTIME | OWNER
-----------------------------------|---------|----------|-----------|--------|-----------|-----------------
dataos-ctl-workflow-lifecycle-02 | v1 | workflow | public | active | succeeded | iamgroot
wf-tmdc-01 | v1 | workflow | public | active | succeeded | iamgroot
Check Workflow Logs to validate execution¶
Copy the name to Workspace from the output table of theΒ getΒ command and use it as a string in the delete command.
dataos-ctl -i "dataos-ctl-volume-lifecycle-01 | v1 | workflow | public" log
# Expected Output
INFO[0000] π log(public)...
INFO[0000] π log(public)...complete
NODE NAME β CONTAINER NAME β ERROR
ββββββββββββββββββββββββΌβββββββββββββββββΌββββββββ
get-workflow-execute β main β
-------------------LOGS-------------------
time="2024-06-18T12:54:38Z" level=info msg="π get..."
time="2024-06-18T12:54:39Z" level=info msg="π get...nothing"
time="2024-06-18T12:54:39.818Z" level=info msg="sub-process exited" argo=true error="<nil>"
NODE NAME β CONTAINER NAME β ERROR
βββββββββββββββββββββββββββΌβββββββββββββββββΌββββββββ
create-workflow-execute β main β
-------------------LOGS-------------------
time="2024-06-18T12:54:18Z" level=info msg="π apply... "
time="2024-06-18T12:54:18Z" level=info msg="π§ applying(public) wf-tmdc-01:v1:workflow..."
time="2024-06-18T12:54:24Z" level=info msg="π§ applying(public) wf-tmdc-01:v1:workflow...updated"
time="2024-06-18T12:54:24Z" level=info msg="π apply...complete"
time="2024-06-18T12:54:24.734Z" level=info msg="sub-process exited" argo=true error="<nil>"