Scheduled or Cron Workflow¶
The code snippet below illustrates a sample schedule Workflow for profiling using the Flare Stack in output depots with Iceberg file format with Hadoop Catalog type and REST Metastore.. To understand schedule related attribute in details click here.
The following code snippet defines a scheduling configuration for a workflow, setting specific rules for when and how the workflow should execute. Here is an extended description of each attribute in the schedule:
This setup ensures the workflow is triggered every 2 minutes, regardless of the hour, day, month, or weekday.
Click here to view the code snippet
# Resource Section
name: scheduled-job-workflow
version: v1
type: workflow
tags:
- eventhub
- write
description: this jobs reads data from thirdparty and writes to eventhub
owner: iamgroot
# Workflow-specific Section
workflow:
title: scheduled
schedule:
cron: '*/2 * * * *' #every 2 minute [Minute, Hour, day of the month ,month, dayoftheweek]
concurrencyPolicy: Allow #forbid/replace
endOn: 2024-11-01T23:40:45Z
timezone: Asia/Kolkata
dag:
- name: write-snowflake-02
title: Reading data and writing to snowflake
description: This job writes data to wnowflake
spec:
tags:
- Connect
- write
stack: flare:5.0
compute: runnable-default
stackSpec:
job:
explain: true
inputs:
- name: poros_workflows
dataset: dataos://systemstreams:poros/workflows
isStream: true
options:
startingOffsets: earliest
logLevel: INFO
outputs:
- name: poros_workflows
dataset: dataos://icebase:sys09/poros_workflows_pulsar?acl=rw
format: Iceberg
options:
saveMode: overwrite
options:
SSL: "true"
driver: "io.trino.jdbc.TrinoDriver"
cluster: "system"
This code snippet will run the workflow in every 2 minutes on the given date to read data from pulsar topic and write to icebase whose detailed explanation is given below .
cron: '*/2 * * * *'
¶
This cron expression specifies the frequency of the workflow execution. In this case:
*/2 in the minute field means the workflow will run every 2 minutes.
The remaining fields ( * * ) mean that this schedule applies to every hour of the day, every day of the month, every month of the year, and every day of the week.
concurrencyPolicy: Allow
¶
- Allow: This permits multiple instances of the workflow to run concurrently. If a new instance of the workflow is triggered before the previous one finishes, both will run simultaneously. To know more about different configuration values of concurrencyPolicy attribute click here
endOn: 2024-11-01T23:40:45Z
¶
The endOn attribute defines the expiration time for the schedule. The workflow will continue to execute according to the defined cron schedule until this date and time:
2024-11-01T23:40:45Z indicates that the workflow will stop being triggered after 23:40:45 UTC on November 1, 2024.The time is in Coordinated Universal Time (UTC).
timezone: Asia/Kolkata
¶
The timezone attribute specifies the time zone for the cron schedule. Here, Asia/Kolkata is used:
This means that the times specified in the cron expression will be interpreted in the Asia/Kolkata time zone. The Asia/Kolkata time zone is 5 hours and 30 minutes ahead of UTC.
Summary
-
The workflow runs every 2 minutes based on Indian Standard Time (IST).
-
The workflow continues until 2024-11-02 05:10:45 IST.
-
The end time corresponds to 2024-11-01T23:40:45Z in UTC.
-
The last scheduled run before the end time will be at 2024-11-02 05:08:00 IST.