Skip to content

Apply a Workflow and get runtime status using CLI Stack

The dataos-ctl stack called CLI Stack can be orchestrated using a Workflow Resource, where each job executes the command once and concludes the process upon completion. This plays a pivotal role in enabling Continuous Integration and Continuous Deployment (CI/CD) workflows that integrate multiple CLI commands, creating a cohesive and automated deployment process.

How to use CLI Stack?

Utilizing CLI Stack involves a series of logical steps, as outlined below:

  1. Create an Instance Secret manifest
  2. Apply the Instance Secret manifest
  3. Create a Workflow manifest
  4. Apply the Workflow manifest
  5. Verify Workflow creation
  6. Check Workflow Logs to validate execution

Create an Instance Secret manifest

To execute a resource using this stack, users need to provide their API key and User ID. This information can be supplied using the an Instance secret. First create an instance secret Resource and then refer this secret within the Workflow Resource.

To fetch the details about the User ID and User API Key token, execute the following commands after logging into DataOS:

To fetch the details about the User ID

dataos-ctl user get
# Sample Output
INFO[0000] πŸ˜ƒ user get...                                
INFO[0000] πŸ˜ƒ user get...complete                        

      NAME     β”‚     ID      β”‚  TYPE  β”‚        EMAIL         β”‚              TAGS               
───────────────┼─────────────┼────────┼──────────────────────┼─────────────────────────────────
  IamGroot     β”‚ iamgroot    β”‚ person β”‚   iamgroot@tmdc.io   β”‚ roles:id:data-dev,              
              β”‚             β”‚        β”‚                      β”‚ roles:id:operator,              
              β”‚  # user_id  β”‚        β”‚                      β”‚ roles:id:system-dev,            
              β”‚             β”‚        β”‚                      β”‚ roles:id:user,                  
              β”‚             β”‚        β”‚                      β”‚ users:id:iamgroot

For User API key token, if apikey token already exists execute command:

dataos-ctl user apikey get
#Expected Output
|               TOKEN                                                       β”‚  TYPE  β”‚        EXPIRATION         β”‚                  NAME                               
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
  jjjjjk5lX21hbGFtdXRlLmI4ZjRlNzc2LTYyNTAtNGI4MC05YTZhLTMwMzI3N2Y3Y2JhZQ==  β”‚ apikey β”‚ 2024-06-18T19:30:00+05:30 β”‚ token_newly_entirely_divine_malamute     
  dG9rZW5faGFyZGx5X3BoLmI3NTcwZmFjLWZlNmEtNDE4NC1iNjA3LTc5MjM1ODVlNDQxYQ==  β”‚ apikey β”‚ 2024-06-28T05:30:00+05:30 β”‚ token_hardly_physically_model_maggot     
  kkloZWxpY2F0ZV9zaGVlcC4xNmRkOTYwOS1mMjRhLTRiMWEtYTc0ZC0OTJjOTExYjE0ZTQ==  β”‚ apikey β”‚ 2024-06-28T05:30:00+05:30 β”‚ token_grossly_socially_delicate_sheep    

If no apikey token exists, create a new one using the following command:

dataos-ctl user apikey create
# Sample Output
INFO[0000] πŸ”‘ user apikey get...                         
INFO[0000] πŸ”‘ user apikey get...complete                 

                   TOKEN                     β”‚  TYPE  β”‚      EXPIRATION      β”‚    NAME                   
────────────────────────────────────────────────────────────────────────────────────────────
  abcdefghijklmnopqrstuvcdefghijklmnop β”‚ apikey β”‚ 2023-12-29T14:00:00Z β”‚ token_abcd
        # dataos_user_apikey_token

Replace ${dataos_user_id} and ${dataos_user_apikey_token} with values obtained from the commands above in the Secret Manifest provided below:

instance_secret_template.yml
# Resource meta section
name: ${dataos-ctl-user-apikey} 
version: v1
type: instance-secret
layer: user

# Instance-secret specific section
instance-secret:
  type: key-value
  acl: rw
  data:
    USER_ID: ${dataos_user_id} 
    APIKEY: ${dataos_user_apikey_token}
instance_secret_example.yml
# Resource meta section
name: dataos-ctl-user-apikey
version: v1
type: instance-secret
layer: user

# Instance-secret specific section
instance-secret:
  type: key-value
  acl: rw
  data:
    USER_ID: iamgroot
    APIKEY: abcdefghijklmnopqrstuvwxyzabcd

Apply the Instance Secret manifest

dataos-ctl apply -f ${instance secret yaml file path}
dataos-ctl apply -f home/iamgroot/workflow/instance_secret.yaml

# Expected Output
INFO[0000] πŸ›  apply...                                   
INFO[0000] πŸ”§ applying dataos-ctl-user-apikey:v1:instance-secret... 
INFO[0002] πŸ”§ applying dataos-ctl-user-apikey:v1:instance-secret...created 
INFO[0002] πŸ›  apply...complete

Create a Workflow manifest

The DataOS CLI Stack can be orchestrated using the Resource-type Workflow.

The Sample YAML for a Workflow that creates a Volume, checks its status, and deletes it is provided below:

# Resource meta section
name: dataos-ctl-workflow-lifecycle-02
version: v1
type: workflow

# Workflow-specific section
workflow:
  dag:

# First Job
  - name: create-workflow
    spec:
      stack: dataos-ctl # dataos-ctl stack name
      compute: runnable-default

        # Referred Instance secrets 
      dataosSecrets:
      - name: dataos-ctl-user-apikey # Instance secret name same as declared above
        allKeys: true
        consumptionType: envVars

        # Stack-specific section
      stackSpec:
        arguments:
          - resource
          - apply
          - -f
          - /etc/dataos/config/manifest.yaml
          - -w
          - public

        # Manifest for the Resource against which the above command is executed
        manifest:
         # Resource Section
          #This is a icebase to icebase workflow hence it's way of giving input and output is diff.
          version: v1 
          name: wf-tmdc-01 
          type: workflow 
          tags:
            - Connect
            - City
          description: The job ingests city data from dropzone into raw zone

          # Workflow-specific Section
          workflow:
            title: Connect City
            dag: 

          # Job 1 Specific Section
              - name: read-write-icebase # Job 1 name
                title: City Dimension Ingester
                description: The job ingests city data from dropzone into raw zone
                spec:
                  tags:
                    - Connect
                    - City
                  stack: flare:5.0 # The job gets executed upon the Flare Stack, so its a Flare Job
                  compute: runnable-default

                  # Flare Stack-specific Section
                  stackSpec:
                    driver:
                      coreLimit: 1100m
                      cores: 1
                      memory: 1048m
                    job:
                      explain: true  #job section will contain explain, log-level, inputs, outputs and steps
                      logLevel: INFO

                      inputs:
                        - name: city_connect
                          query: SELECT
                                  *,
                                  date_format (NOW(), 'yyyyMMddHHmm') AS version1,
                                  NOW() AS ts_city1
                                FROM
                                  icebase.retail.city 
                      #    dataset: dataos://icebase:retail/city
                      #   format: Iceberg
                          options: 
                            SSL: "true"
                            driver: "io.trino.jdbc.TrinoDriver"
                            cluster: "system"

                        #   schemaPath: dataos://thirdparty01:none/schemas/avsc/city.avsc #schema path is not necessary for icebase to icebase

                      outputs:
                        - name: city_connect
                          dataset: dataos://icebase:retail/city10?acl=rw
                          format: Iceberg
                          description: City data ingested from retail city
                          tags:
                            - retail
                            - city
                          options:
                            saveMode: overwrite

# Second Job
  - name: get-workflow
    spec:
      stack: dataos-ctl
      compute: runnable-default

        # Referred Instance secrets 
      dataosSecrets:
      - name: dataos-ctl-user-apikey
        allKeys: true
        consumptionType: envVars

        # Stack-specific section
      stackSpec:
        arguments:
          - resource
          - get
          - -t
          - workflow
          - -n
          - temp001
          - -w
          - public
    dependencies:
     - create-workflow # Second Job dependent on successful execution of First Job

Apply the Workflow manifest

dataos-ctl apply -f ${workflow yaml file path} -w ${workspace name}

# Sample and Expected Output
dataos-ctl apply -f workflow/volume_lifecycle.yml
INFO[0000] πŸ›  apply...                                   
INFO[0000] πŸ”§ applying(public) dataos-ctl-workflow-lifecycle-02:v1:workflow... 
INFO[0003] πŸ”§ applying(public) dataos-ctl-workflow-lifecycle-02:v1:workflow...created 
INFO[0003] πŸ›  apply...complete                          

Verify Workflow creation

dataos-ctl get -t workflow -w ${workspace name}

# Sample Output
INFO[0000] πŸ” get...                                     
INFO[0001] πŸ” get...complete                             

                NAME               | VERSION |   TYPE   | WORKSPACE | STATUS |  RUNTIME  |     OWNER       
-----------------------------------|---------|----------|-----------|--------|-----------|-----------------
  dataos-ctl-workflow-lifecycle-02 | v1      | workflow | public    | active | succeeded | iamgroot

  wf-tmdc-01                       | v1      | workflow | public    | active | succeeded | iamgroot

Check Workflow Logs to validate execution

Copy the name to Workspace from the output table of theΒ getΒ command and use it as a string in the delete command.

dataos-ctl -i "${copy the name to workspace in the output table from get command}" --node ${failed node name from get runtime command} log
dataos-ctl -i "dataos-ctl-volume-lifecycle-01 | v1      | workflow | public" log                                                                                             
# Expected Output

INFO[0000] πŸ“ƒ log(public)...                             
INFO[0000] πŸ“ƒ log(public)...complete                     

      NODE NAME       β”‚ CONTAINER NAME β”‚ ERROR  
───────────────────────┼────────────────┼────────
  get-workflow-execute β”‚ main           β”‚        

-------------------LOGS-------------------
time="2024-06-18T12:54:38Z" level=info msg="πŸ” get..."
time="2024-06-18T12:54:39Z" level=info msg="πŸ” get...nothing"
time="2024-06-18T12:54:39.818Z" level=info msg="sub-process exited" argo=true error="<nil>"

        NODE NAME        β”‚ CONTAINER NAME β”‚ ERROR  
──────────────────────────┼────────────────┼────────
  create-workflow-execute β”‚ main           β”‚        

-------------------LOGS-------------------
time="2024-06-18T12:54:18Z" level=info msg="πŸ›  apply... "
time="2024-06-18T12:54:18Z" level=info msg="πŸ”§ applying(public) wf-tmdc-01:v1:workflow..."
time="2024-06-18T12:54:24Z" level=info msg="πŸ”§ applying(public) wf-tmdc-01:v1:workflow...updated"
time="2024-06-18T12:54:24Z" level=info msg="πŸ›  apply...complete"
time="2024-06-18T12:54:24.734Z" level=info msg="sub-process exited" argo=true error="<nil>"