Monitor: First Steps¶
The quick start guide uses a few small examples or real-world scenarios to demonstrate how the Monitor Resource can be used in DataOS. It guides you through setting up a Monitor Resource using the DataOS CLI.
Follow the sections of the quick start guide in order.
Problem Statement¶
Assume you are a data engineer for a rapidly growing online application that has recently launched a high-profile marketing campaign. The application is gaining new users at an unexpected rate, and itโs crucial to ensure the system scales smoothly to handle the influx without impacting performance or user experience.
To resolve this, we can use Monitor Resource in DataOS to set up incidents that help track metrics or events and trigger incidents when predefined conditions are met. This quick start will show you how to set up an Equation Monitor for user registrations (rows in the customer database table) that exceed a threshold that could potentially strain resources or trigger performance issues.
Scenario Details¶
- Database Table:
icebase.retail.customers
- Metric: Row count in the
customers
table - Threshold: 100,000 rows
- Condition: Row count exceeds 100,000 rows
- Incident: Output generated by the Monitor when the above condition is met.
Prerequisites¶
Before we create our first Monitor, we have to make sure weโve got the prerequisites for creating a Monitor in DataOS.
Logged into DataOS CLI¶
Make sure that you have the DataOS Command Line Interface set up on your local system and logged in before proceeding ahead. See Setting up CLI
Access Permissions¶
Make sure you have the appropriate access permissions to create and manage a Monitor
Now you have everything necessary to start using Monitor in DataOS.
Steps¶
Step 1: Create a manifest file of a Monitor¶
A Monitor is a type of Resource in DataOS, so you can create a Monitor Resource-instance by applying the manifest file of Monitor using the DataOS CLI. So we will now proceed with creating the manifest file for Monitor. The manifest file for a Monitor is provided below:
# RESOURCE META SECTION
name: inventory-level-monitor
version: v1alpha
type: monitor
tags:
- dataos:type:resource
- dataos:type:cluster-resource
- dataos:resource:policy
- dataos:layer:user
description: Table row count threshold
# MONITOR-SPECIFIC SECTION
monitor:
# Schedule
schedule: '* /2 * * * *'
# Equation
equation:
# Left hand side expression
left_expression:
query_coefficient: 1
query_constant: 0
query:
type: trino
cluster: minithemis
ql: SELECT CASE WHEN check_outcome = 'pass' THEN 1 WHEN check_outcome = 'fail' THEN 0 END FROM icebase.soda.soda_check_metrics_01 WHERE metric_name = 'row_count' AND dataset = 'customer' ORDER BY timestamp DESC LIMIT 1;
# Right Hand Side Expression
right_expression:
query_coefficient: 0
query_constant: 0
operator: equals
# Incident
incident:
asset_name: customer
incident_type: rowcount
severity: medium
As this is your first Monitor, we have tried to keep it as simple as possible. Certain attributes need to be configured within the monitor manifest file.
Attribute | Description |
---|---|
name |
Provide an appropriate name for the Monitor Resource. |
tags |
Optional tags for the Monitor Resource. You can provide as many as you want as these help in searchability and indexing. |
description |
This is the Monitor Resourceโs description. You can use the provided value (or, if you want to be really creative, come up with your own description) |
schedule |
It represents the frequency or cadence at which the Monitor checks the condition |
leftExpression / rightExpression |
These represent the left hand side and the right hand side of the equation. You can consider an expression of the format QUERY_COEFFICIENT*QUERY+QUERY_CONSTANT |
query_coefficient |
This is the multiplier of the query result |
query_constant |
This is a constant value with which is added to the product of Query Coefficient and Query. |
operator |
The Operator compares the result of the left hand side and the right hand side of the equation. |
incident |
key-value pairs to be included in the message generated by the monitor |
Step 2: Check the Monitor Equation¶
dataos-ctl develop observability monitor equation -f testing/manifest/monitor/new_monitor.yml
INFO[0000] ๐ฎ develop observability...
INFO[0000] ๐ฎ develop observability...monitor tcp-stream...starting
INFO[0001] ๐ฎ develop observability...monitor tcp-stream...running
INFO[0002] ๐ฎ develop observability...monitor tcp-stream...stopping
INFO[0002] ๐ฎ context cancelled, monitor tcp-stream is closing.
INFO[0003] ๐ฎ develop observability...complete
RESULT (maxRows: 10, totalRows:1): ๐ฉ monitor condition met
EXP VAL (LEFT) | OP | EXP VAL (RIGHT) | COL0 (LEFT-COMP) | CONSTANT (RIGHT-COMP)
-----------------|-----------|-----------------|-------------------|------------------------
53375.00 | less_than | 100000.00 | 53375.00 | 1.00
Step 3: Apply the Monitor manifest through CLI¶
Once you have created your Monitor manifest file, you will need to apply it within the DataOS environment to create the Monitor Resource-Instance. You can do this using the Command Line Interface (CLI) with the following commands:
Alternatively, this task can also be accomplished using a simpler command. Both commands are equivalent, and you can use either one depending on your preference:
Example
Here is an example of how to apply a Monitor manifest file located at /home/monitor/incident-monitor.yml
to the curriculum
workspace:
Expected Output
After running the command, you should see an output similar to the following, indicating that the Monitor Resource instance has been applied.
# Expected Output
INFO[0000] ๐ apply...
INFO[0001] ๐ applying(curriculum) cpu-usage-spike:v1alpha:monitor...
INFO[0002] ๐ applying(curriculum) cpu-usage-spike:v1alpha:monitor...created
INFO[0003] ๐ apply...complete
Step 4: Verify Monitor Status¶
Use the below command to get all the existing monitors for all owners.
Sample
dataos-ctl resource get -t monitor -w curriculum -a
# Expected output
INFO[0000] ๐ get...
INFO[0000] ๐ get...complete
NAME | VERSION | TYPE | WORKSPACE | STATUS | RUNTIME | OWNER
----------------|---------|---------|------------|--------|---------|----------------
my-monitor | v1alpha | monitor | curriculum | active | | iamgroot
monitor101 | v1alpha | monitor | curriculum | active | | thor
You can also access the details of any created Monitor through the DataOS GUI in the ย Operations App.
Step 5: Get the Runtime status of the Monitor¶
dataos-ctl get runtime -t monitor -w public -n monitorthemisnew01
# Expected output
dataos-ctl resource get runtime -t monitor -w public -n monitorthemisnew01
INFO[0000] ๐ monitor...
INFO[0001] ๐ monitor...complete
NAME | VERSION | TYPE | WORKSPACE | OWNER
---------------------|---------|---------|-----------|--------------
monitorthemisnew01 | v1alpha | monitor | public | piyushjoshi
STATUS | RUNTIME
---------|---------------------------------
active | next:2024-05-07T18:00:00+05:30
RUN ID | STARTED | FINISHED | RUN STATUS | RESULT
---------------|---------------------------|---------------------------|------------|-----------------------------------------------------------------------------------------------------
dkzlw6v36br8 | 2024-05-07T17:58:00+05:30 | 2024-05-07T17:58:00+05:30 | completed | ๐ฉ monitor condition met for monitor: 'monitorthemisnew01_public', created incident 'dkzlw8l3nksg'
---------------|---------------------------|---------------------------|------------|-----------------------------------------------------------------------------------------------------
dkzlprlu3bpg | 2024-05-07T17:56:00+05:30 | 2024-05-07T17:56:00+05:30 | completed | ๐ฉ monitor condition met for monitor: 'monitorthemisnew01_public', created incident 'dkzlpt9k5xc0'
---------------|---------------------------|---------------------------|------------|-----------------------------------------------------------------------------------------------------
dkzljce3y39e | 2024-05-07T17:54:00+05:30 | 2024-05-07T17:54:00+05:30 | completed | ๐ฉ monitor condition met for monitor: 'monitorthemisnew01_public', created incident 'dkzlje87rcow'
---------------|---------------------------|---------------------------|------------|-----------------------------------------------------------------------------------------------------
dkzlcx5wbke8 | 2024-05-07T17:52:00+05:30 | 2024-05-07T17:52:01+05:30 | completed | ๐ฉ monitor condition met for monitor: 'monitorthemisnew01_public', created incident 'dkzlcz1e2sqo'
---------------|---------------------------|---------------------------|------------|-----------------------------------------------------------------------------------------------------
dkzl6hy16j9f | 2024-05-07T17:50:00+05:30 | 2024-05-07T17:50:01+05:30 | completed | ๐ฉ monitor condition met for monitor: 'monitorthemisnew01_public', created incident 'dkzl6jwpt3i8'
---------------|---------------------------|---------------------------|------------|-----------------------------------------------------------------------------------------------------
dkzl02qdj75u | 2024-05-07T17:48:00+05:30 | 2024-05-07T17:48:00+05:30 | completed | ๐ฉ monitor condition met for monitor: 'monitorthemisnew01_public', created incident 'dkzl03nt69s0'
---------------|---------------------------|---------------------------|------------|-----------------------------------------------------------------------------------------------------
dkzktni3erya | 2024-05-07T17:46:00+05:30 | 2024-05-07T17:46:00+05:30 | completed | ๐ฉ monitor condition met for monitor: 'monitorthemisnew01_public', created incident 'dkzktoef3h1c'
---------------|---------------------------|---------------------------|------------|-----------------------------------------------------------------------------------------------------
Step 6: Check for Incident Messages¶
- To check the incident messages use the following command:
dataos-ctl develop observability incident -i ${runID}
dataos-ctl develop observability incident -i dkzlw8l3nksg
INFO[0000] ๐ฎ develop observability...
INFO[0000] ๐ฎ develop observability...monitor tcp-stream...starting
INFO[0001] ๐ฎ develop observability...monitor tcp-stream...running
INFO[0002] ๐ฎ develop observability...monitor tcp-stream...stopping
INFO[0002] ๐ฎ context cancelled, monitor tcp-stream is closing.
INFO[0003] ๐ฎ develop observability...complete
{
"id": "dkzlw8l3nksg",
"createTime": "2024-05-07T12:28:00.931693759Z",
"properties": {
"category": "test",
"incidentType": "resource_consumption",
"name": "monitor-incident",
"severity": "high",
"stuff": "stuff",
"summary": "some summary",
"type": "pulsar"
},
"equationContext": {
"queryExpressions": [
{
"leftExpressionValue": "53375.00",
"rightExpressionValue": "100000.00",
"leftRow": {
"comparisonColumn": {
"name": "_col0",
"value": "53375.00"
}
},
"rightRow": {
"comparisonColumn": {
"name": "constant",
"value": "1.00"
}
}
}
]
},
"monitor": {
"id": "monitorthemisnew01_public",
"name": "monitorthemisnew01",
"description": "table row count threshold",
"schedule": "*/2 * * * *",
"timezone": "UTC",
"type": "equation_monitor",
"equationMonitor": {
"leftExpression": {
"queryCoefficient": 1,
"queryConstant": 0,
"query": {
"type": "trino",
"cluster": "system",
"ql": "select count(*) from \"icebase\".\"retail\".city"
}
},
"rightExpression": {
"queryCoefficient": 0,
"queryConstant": 100000
},
"operator": "less_than"
}
}
}
Next steps¶
Your next steps depend upon whether you want to set up alerts on top of the incidents generated using Monitor or want to learn about the different types and key concepts associated with Monitor:
- Learn how to set up alerts on top of incidents generated using Pager Resource. See Pager: Quick Start Guide.
- Add context column to the Monitor. Learn How to add context column to the Monitors?
- Set up incidents on top of events using Report Monitor. See How to Create a Report Monitor?
- Set up incidents on top of streaming data using Stream Monitor. Learn How to Create a Stream Monitor?
- Set up incidents on top of Lens using Equation Monitor. See How to Create a Equation Monitor on top of Lens?
- Set up incidents on top of Postgres source using Equation Monitor. See How to Create a Equation Monitor on top of Postgres source?
- Set up incidents for certificate expiration using Equation Monitor. See How to Generate incidents for certificate expiration?