Skip to content

Steps to create a streaming Monitor

Follow the below steps to create a streaming Monitor:

Pre-requisites

Before creating a streaming Monitor, the following prerequisites must be met for creating a Monitor in DataOS.

Logged into DataOS CLI:

Ensure that the DataOS Command Line Interface is set up on the local system and that the user is logged in before proceeding. See Setting up CLI.

Permissions:

Verify that the appropriate permissions are available to create and manage a Monitor.

Once the prerequisites are satisfied, Monitor Resources in DataOS can be created.

Step 1: Create a manifest file of a Monitor

Copy and edit the manifest file below to create a streaming Monitor for Workflow runtime failure. Replace the provided Workflow name with the actual Workflow name. To observe the status instead of runtime, replace runtime with status in dataOsInatnace.path.

name: ${{monitor-name}}
version: v1beta
type: worker
tags:
  - pod
  - stream
  - monitor
description: ${{Add a short description}}
worker:
  stack: stream-monitor
  compute: ${{compute-name}}
  replicas: 1
  resources:
    requests:
      cpu: 100m
      memory: 100Mi
    limits:
      cpu: 200m
      memory: 200Mi
  stackSpec:
    type: stream_monitor
    incident:
      type: ${{incident-type}}
      name: ${{incident-name}}
      summary: ${{incident-summary}}
      category: stream
      severity: ${{severity, e.g., warning}}
    stream:
      source:
        fastbase:
          topic: ${{fastbase_topic, e.g., persistent://system/poros/pods}}
          maxWorkers: 100
          subscriptionName: ${{pods-stream-monitor-125-001}}
          subscriptionPosition: ${{latest}}
      conditions:
        - valueJqFilter: ${{condition-1-filter}}       # e.g., '.producer_name'
          operator: ${{condition-1-operator}}          # e.g., equals
          value: ${{condition-1-value}}                # e.g., 'poros-recorder-1-hub'
        - valueJqFilter: ${{condition-2-filter}}       # e.g., '.properties.dataplane'
          operator: ${{condition-2-operator}}          # e.g., equals
          value: ${{condition-2-value}}                # e.g., 'hub'
        - valueJqFilter: ${{condition-3-filter}}       # e.g., '.key'
          operator: ${{condition-3-operator}}          # e.g., contains
          value: ${{condition-3-value}}                # e.g., 'delete:pods:juicefs'
name: pods-stream-monitor
version: v1beta
type: worker
tags:
  - pod
  - stream
  - monitor
description: testing the pods stream monitors
worker:
  stack: stream-monitor
  compute: azure-s-runnable
  replicas: 1
  resources:
    requests:
      cpu: 100m
      memory: 100Mi
    limits:
      cpu: 200m
      memory: 200Mi
  stackSpec:
    type: stream_monitor
    incident:
      type: pulsar
      name: stream-monitor-incident
      summary: 'stream monitor incident, found a specific pod'
      category: stream
      severity: warning
    stream:
      source:
        fastbase:
          topic: persistent://system/poros/pods
          maxWorkers: 100
          subscriptionName: pods-stream-monitor-125-001
          subscriptionPosition: latest
      conditions:
        - valueJqFilter: '.producer_name'
          operator: equals
          value: 'poros-recorder-1-hub'
        - valueJqFilter: '.properties.dataplane'
          operator: equals
          value: 'hub'
        - valueJqFilter: '.key'
          operator: contains
          value: 'delete:pods:juicefs'

Possible values for runtime and status

Attribute Description Common Values
runtime Indicates the execution state or duration of a Resource. running, succeeded, failed, (or a specific time duration, e.g., 5m, 1h)
status Represents the operational status of a Resource. active, error, deleted

Attributes description:

Attribute Description
monitor.report.source.dataOsInstance.path API path to the report data source to be monitored.
monitor.report.conditions List of conditions to evaluate against the report data.
monitor.report.conditions[].valueComparison.observationType The type of observation to check (e.g., runtime, status).
monitor.report.conditions[].valueComparison.valueJqFilter JQ filter to extract the value from the report data (e.g., '.value').
monitor.report.conditions[].valueComparison.operator Operator used for comparison (e.g., equals, less_than, greater_than).
monitor.report.conditions[].valueComparison.value The value to compare against (e.g., failed).

To know more about each attributes, please refer to the configuration section.

Step 2: Apply the Monitor manifest through CLI

Once the Monitor manifest file is created, it must be applied within the DataOS environment to create the Monitor Resource. This can be accomplished using the Command Line Interface (CLI) with the following commands:

dataos-ctl resource apply -f ${manifest-file-path} -w ${workspace-name}

Alternatively, this task can also be accomplished using a simpler command. Both commands are equivalent, and either one can be used depending on preference:

dataos-ctl apply -f ${manifest-file-path} -w ${workspace-name}

Example:

Below is an example of how to apply a Monitor manifest file located at /home/monitor/incident-monitor.yml to the curriculum workspace:

dataos-ctl resource apply -f /home/monitor/incident-monitor.yml -w curriculum

Expected output:

After running the command, an output similar to the following should be displayed, indicating that the Monitor Resource instance has been applied.

# Expected Output
INFO[0000] 🔍 apply...                                     
INFO[0001] 🔍 applying(curriculum) cpu-usage-spike:v1alpha:monitor...
INFO[0002] 🔍 applying(curriculum) cpu-usage-spike:v1alpha:monitor...created                                     
INFO[0003] 🔍 apply...complete

Step 3: Verify Monitor status

The following command is used to check the current states of the Monitor Resource to verify that Monitor is configured properly.

dataos-ctl resource get -t Monitor -w ${{workspace name}} -n ${{monitor-name}} -r
dataos-ctl resource get -t Monitor -w curriculum -n runtime-monitor1 -r

# Expected output               
INFO[0000] 🔍 get...                                     
INFO[0001] 🔍 get...complete                             

        NAME       | VERSION |  TYPE   | WORKSPACE | STATUS |            RUNTIME             |    OWNER     
-------------------|---------|---------|-----------|--------|--------------------------------|--------------
  runtime-monitor1 | v1alpha | Monitor | curriculum    | active | next:2025-07-28T13:40:00+05:30 | iamgroot

Details of any created Monitor can also be accessed through the DataOS GUI in the Metis catalog and Operations App.

Operations App
Operations App

Metis Catalog
Metis Catalog


Next steps

After successfully creating the Report Monitor for Workflow runtime failure, the next step is to configure a Pager Resource. The Pager Resource is responsible for sending alerts to configured Teams channels or email addresses whenever the incident condition is met.