Skip to content

Monitor: First Steps

The quick start guide uses a few small examples or real-world scenarios to demonstrate how the Monitor Resource can be used in DataOS. It guides you through setting up a Monitor Resource using the DataOS CLI.

Follow the sections of the quick start guide in order.

Problem Statement

Assume you are a data engineer for a rapidly growing online application that has recently launched a high-profile marketing campaign. The application is gaining new users at an unexpected rate, and itโ€™s crucial to ensure the system scales smoothly to handle the influx without impacting performance or user experience.

To resolve this, we can use Monitor Resource in DataOS to set up incidents that help track metrics or events and trigger incidents when predefined conditions are met. This quick start will show you how to set up an Equation Monitor for user registrations (rows in the customer database table) that exceed a threshold that could potentially strain resources or trigger performance issues.

Scenario Details

  • Database Table: icebase.retail.customers
  • Metric: Row count in the customers table
  • Threshold: 100,000 rows
  • Condition: Row count exceeds 100,000 rows
  • Incident: Output generated by the Monitor when the above condition is met.

Prerequisites

Before we create our first Monitor, we have to make sure weโ€™ve got the prerequisites for creating a Monitor in DataOS.

Logged into DataOS CLI

Make sure that you have the DataOS Command Line Interface set up on your local system and logged in before proceeding ahead. See Setting up CLI

Access Permissions

Make sure you have the appropriate access permissions to create and manage a Monitor

Now you have everything necessary to start using Monitor in DataOS.

Steps

Step 1: Create a manifest file of a Monitor

A Monitor is a type of Resource in DataOS, so you can create a Monitor Resource-instance by applying the manifest file of Monitor using the DataOS CLI. So we will now proceed with creating the manifest file for Monitor. The manifest file for a Monitor is provided below:

# RESOURCE META SECTION
name: inventory-level-monitor
version: v1alpha
type: monitor
tags:
    - dataos:type:resource
    - dataos:type:cluster-resource
    - dataos:resource:policy
    - dataos:layer:user
description: Table row count threshold

# MONITOR-SPECIFIC SECTION
monitor:
    # Schedule
    schedule: '* /2 * * * *'
    # Equation
    equation:
        # Left hand side expression
        left_expression:
            query_coefficient: 1
            query_constant: 0
            query:
                type: trino
                cluster: minithemis
        ql: SELECT CASE WHEN check_outcome = 'pass' THEN 1 WHEN check_outcome = 'fail' THEN 0 END FROM icebase.soda.soda_check_metrics_01 WHERE metric_name = 'row_count' AND dataset = 'customer' ORDER BY timestamp DESC LIMIT 1;
    # Right Hand Side Expression
    right_expression:
      query_coefficient: 0
      query_constant: 0
    operator: equals
  # Incident
  incident:
    asset_name: customer
    incident_type: rowcount
    severity: medium

As this is your first Monitor, we have tried to keep it as simple as possible. Certain attributes need to be configured within the monitor manifest file.

Attribute Description
name Provide an appropriate name for the Monitor Resource.
tags Optional tags for the Monitor Resource. You can provide as many as you want as these help in searchability and indexing.
description This is the Monitor Resourceโ€™s description. You can use the provided value (or, if you want to be really creative, come up with your own description)
schedule It represents the frequency or cadence at which the Monitor checks the condition
leftExpression / rightExpression These represent the left hand side and the right hand side of the equation. You can consider an expression of the format QUERY_COEFFICIENT*QUERY+QUERY_CONSTANT
query_coefficient This is the multiplier of the query result
query_constant This is a constant value with which is added to the product of Query Coefficient and Query.
operator The Operator compares the result of the left hand side and the right hand side of the equation.
incident key-value pairs to be included in the message generated by the monitor

Step 2: Check the Monitor Equation

dataos-ctl develop observability monitor equation -f testing/manifest/monitor/new_monitor.yml
INFO[0000] ๐Ÿ”ฎ develop observability...                   
INFO[0000] ๐Ÿ”ฎ develop observability...monitor tcp-stream...starting 
INFO[0001] ๐Ÿ”ฎ develop observability...monitor tcp-stream...running 
INFO[0002] ๐Ÿ”ฎ develop observability...monitor tcp-stream...stopping 
INFO[0002] ๐Ÿ”ฎ context cancelled, monitor tcp-stream is closing. 
INFO[0003] ๐Ÿ”ฎ develop observability...complete           

RESULT (maxRows: 10, totalRows:1): ๐ŸŸฉ monitor condition met

  EXP VAL (LEFT) |    OP     | EXP VAL (RIGHT) | COL0 (LEFT-COMP)  | CONSTANT (RIGHT-COMP)  
-----------------|-----------|-----------------|-------------------|------------------------
  53375.00       | less_than | 100000.00       | 53375.00          | 1.00                   

Step 3: Apply the Monitor manifest through CLI

Once you have created your Monitor manifest file, you will need to apply it within the DataOS environment to create the Monitor Resource-Instance. You can do this using the Command Line Interface (CLI) with the following commands:

dataos-ctl resource apply -f ${manifest-file-path} -w ${workspace-name}

Alternatively, this task can also be accomplished using a simpler command. Both commands are equivalent, and you can use either one depending on your preference:

dataos-ctl apply -f ${manifest-file-path} -w ${workspace-name}

Example

Here is an example of how to apply a Monitor manifest file located at /home/monitor/incident-monitor.yml to the curriculum workspace:

dataos-ctl resource apply -f /home/monitor/incident-monitor.yml -w curriculum

Expected Output

After running the command, you should see an output similar to the following, indicating that the Monitor Resource instance has been applied.

# Expected Output
INFO[0000] ๐Ÿ” apply...                                     
INFO[0001] ๐Ÿ” applying(curriculum) cpu-usage-spike:v1alpha:monitor...
INFO[0002] ๐Ÿ” applying(curriculum) cpu-usage-spike:v1alpha:monitor...created                                     
INFO[0003] ๐Ÿ” apply...complete

Step 4: Verify Monitor Status

Use the below command to get all the existing monitors for all owners.

dataos-ctl resource get -t monitor -w ${workspace name} -a

Sample

dataos-ctl resource get -t monitor -w curriculum -a

# Expected output
INFO[0000] ๐Ÿ” get...                                     
INFO[0000] ๐Ÿ” get...complete                             
       NAME     | VERSION |  TYPE   | WORKSPACE  | STATUS | RUNTIME | OWNER      
----------------|---------|---------|------------|--------|---------|----------------
    my-monitor  | v1alpha | monitor | curriculum | active |         | iamgroot 
    monitor101  | v1alpha | monitor | curriculum | active |         | thor

You can also access the details of any created Monitor through the DataOS GUI in the ย Operations App.

Step 5: Get the Runtime status of the Monitor

dataos-ctl get runtime -t monitor -w public -n monitorthemisnew01
# Expected output
dataos-ctl resource get runtime -t monitor -w public -n monitorthemisnew01
INFO[0000] ๐Ÿ” monitor...                                 
INFO[0001] ๐Ÿ” monitor...complete                         

         NAME        | VERSION |  TYPE   | WORKSPACE |    OWNER     
---------------------|---------|---------|-----------|--------------
  monitorthemisnew01 | v1alpha | monitor | public    | piyushjoshi  

  STATUS |            RUNTIME              
---------|---------------------------------
  active | next:2024-05-07T18:00:00+05:30  

     RUN ID    |          STARTED          |         FINISHED          | RUN STATUS |                                               RESULT                                                
---------------|---------------------------|---------------------------|------------|-----------------------------------------------------------------------------------------------------
  dkzlw6v36br8 | 2024-05-07T17:58:00+05:30 | 2024-05-07T17:58:00+05:30 | completed  | ๐ŸŸฉ monitor condition met for monitor: 'monitorthemisnew01_public', created incident 'dkzlw8l3nksg'  
---------------|---------------------------|---------------------------|------------|-----------------------------------------------------------------------------------------------------
  dkzlprlu3bpg | 2024-05-07T17:56:00+05:30 | 2024-05-07T17:56:00+05:30 | completed  | ๐ŸŸฉ monitor condition met for monitor: 'monitorthemisnew01_public', created incident 'dkzlpt9k5xc0'  
---------------|---------------------------|---------------------------|------------|-----------------------------------------------------------------------------------------------------
  dkzljce3y39e | 2024-05-07T17:54:00+05:30 | 2024-05-07T17:54:00+05:30 | completed  | ๐ŸŸฉ monitor condition met for monitor: 'monitorthemisnew01_public', created incident 'dkzlje87rcow'  
---------------|---------------------------|---------------------------|------------|-----------------------------------------------------------------------------------------------------
  dkzlcx5wbke8 | 2024-05-07T17:52:00+05:30 | 2024-05-07T17:52:01+05:30 | completed  | ๐ŸŸฉ monitor condition met for monitor: 'monitorthemisnew01_public', created incident 'dkzlcz1e2sqo'  
---------------|---------------------------|---------------------------|------------|-----------------------------------------------------------------------------------------------------
  dkzl6hy16j9f | 2024-05-07T17:50:00+05:30 | 2024-05-07T17:50:01+05:30 | completed  | ๐ŸŸฉ monitor condition met for monitor: 'monitorthemisnew01_public', created incident 'dkzl6jwpt3i8'  
---------------|---------------------------|---------------------------|------------|-----------------------------------------------------------------------------------------------------
  dkzl02qdj75u | 2024-05-07T17:48:00+05:30 | 2024-05-07T17:48:00+05:30 | completed  | ๐ŸŸฉ monitor condition met for monitor: 'monitorthemisnew01_public', created incident 'dkzl03nt69s0'  
---------------|---------------------------|---------------------------|------------|-----------------------------------------------------------------------------------------------------
  dkzktni3erya | 2024-05-07T17:46:00+05:30 | 2024-05-07T17:46:00+05:30 | completed  | ๐ŸŸฉ monitor condition met for monitor: 'monitorthemisnew01_public', created incident 'dkzktoef3h1c'  
---------------|---------------------------|---------------------------|------------|-----------------------------------------------------------------------------------------------------

Step 6: Check for Incident Messages

  • To check the incident messages use the following command:
dataos-ctl develop observability incident -i ${runID}

dataos-ctl develop observability incident -i dkzlw8l3nksg
INFO[0000] ๐Ÿ”ฎ develop observability...                   
INFO[0000] ๐Ÿ”ฎ develop observability...monitor tcp-stream...starting 
INFO[0001] ๐Ÿ”ฎ develop observability...monitor tcp-stream...running 
INFO[0002] ๐Ÿ”ฎ develop observability...monitor tcp-stream...stopping 
INFO[0002] ๐Ÿ”ฎ context cancelled, monitor tcp-stream is closing. 
INFO[0003] ๐Ÿ”ฎ develop observability...complete           

{
  "id": "dkzlw8l3nksg",
  "createTime": "2024-05-07T12:28:00.931693759Z",
  "properties": {
    "category": "test",
    "incidentType": "resource_consumption",
    "name": "monitor-incident",
    "severity": "high",
    "stuff": "stuff",
    "summary": "some summary",
    "type": "pulsar"
  },
  "equationContext": {
    "queryExpressions": [
      {
        "leftExpressionValue": "53375.00",
        "rightExpressionValue": "100000.00",
        "leftRow": {
          "comparisonColumn": {
            "name": "_col0",
            "value": "53375.00"
          }
        },
        "rightRow": {
          "comparisonColumn": {
            "name": "constant",
            "value": "1.00"
          }
        }
      }
    ]
  },
  "monitor": {
    "id": "monitorthemisnew01_public",
    "name": "monitorthemisnew01",
    "description": "table row count threshold",
    "schedule": "*/2 * * * *",
    "timezone": "UTC",
    "type": "equation_monitor",
    "equationMonitor": {
      "leftExpression": {
        "queryCoefficient": 1,
        "queryConstant": 0,
        "query": {
          "type": "trino",
          "cluster": "system",
          "ql": "select count(*) from \"icebase\".\"retail\".city"
        }
      },
      "rightExpression": {
        "queryCoefficient": 0,
        "queryConstant": 100000
      },
      "operator": "less_than"
    }
  }
}

Next steps

Your next steps depend upon whether you want to set up alerts on top of the incidents generated using Monitor or want to learn about the different types and key concepts associated with Monitor: