Skip to content

Monitor

The Monitor Resource is an integral part of DataOS's Observability System, designed to trigger incidents based on specific events or metrics. By leveraging the Monitor Resource alongside the Pager Resource, DataOS users can achieve comprehensive observability and proactive incident management, ensuring high system reliability and performance.

  • How to create and manage a Monitor Resource?


    Learn how to create and manage a Monitor in DataOS.

    Create and manage a Monitor

  • How to configure the manifest file of Monitor?


    Discover how to configure the manifest file of Monitor by adjusting its attributes.

    Monitor attributes

  • How does a Monitor work?


    Understand the inner workings of a Monitor within DataOS.

    Working of a Monitor

  • Monitor recipes


    Explore examples showcasing the usage of Monitor in various scenarios.

    Monitor usage examples

Key Concepts of Monitor

Metric

A quantifiable occurrence used to track and assess the status or performance of specific aspects of a system, process, or business function. These are objective, numerical data points that are collected and analyzed to identify trends, patterns, or deviations from expected norms. For example,

  • Average CPU Utilization: Average percentage of CPU usage over a day/month/year.
  • Customer Churn Rate: Percentage of customers lost in the past year.

Event

A qualifiable occurrence within a system or process that reflects a specific change, action, anomaly, etc. Unlike metrics, which are quantitative measures, events are discrete and qualitative. They can range from system-generated alerts, such as errors or status changes, to user-initiated actions, like deployments or configuration changes. For example,

  • Data Pipeline Failure: A data processing pipeline encounters an error, halting data flow.
  • Schema Change in Data Source: A modification is detected in the schema of a source database.

Condition

A user-defined criterion or a set of criteria that must be satisfied by a metric or event for an incident to be generated. For example,

  • Average CPU Utilization (metric) exceeds 85% (condition).
  • IoT device reports (event) connectivity 'loss' (condition).

Incident

Output generated by the Monitor when a predefined condition, based on metrics or events, is met. For example,

  • Output generated when either the Average CPU Utilization (metric) exceeds the predefined condition or an Error 500 occurs (event), indicating a performance issue or critical failure, respectively.

How does a Monitor work?

A Monitor in DataOS takes metrics and events as input and generates incidents as output.

Process

  1. Observation: Monitor observes metrics or events at a specific cadence.
  2. Evaluation: Each observed metric or event is assessed to determine if it matches a predefined condition for triggering an incident.
  3. Incident Generation: If a metric or event satisfies the condition, the Monitor generates an incident and sends it to the Incident stream.

Following incident publication, users can configure alerts based on these incidents. This enables targeted notification through the creation of Pager Resource instances, facilitating prompt response to incidents detected by the Monitor.

Monitor Service Architecture

The Monitor Service has two major components - Monitor DB and Scheduler. Monitor DB stores the manifest file of each instance of Monitor in its database. The Scheduler reads from this manifest file to query the Metric & Event store. This Metric & Event store is a proxy for any storage system, such as an SQL database, Prometheus DB, a streaming DB or a queryable source of choice.

Monitor Service is responsible for cross-checking the condition declared by the user with an Event/Metric generated in the system. If the condition is fulfilled, it publishes the information mentioned in the manifest file as an incident to a Fastbase topic. This Fastbase topic is a store based on the Pulsar DB system.

Working of a Monitor Service

Working of a Monitor Service

Types of Monitor

Monitor Type

Monitors within the DataOS are categorized into three distinct types, differentiated by the data source format they interrogate and the data type they assess. This classification ensures that users can select the most appropriate monitor type for their specific requirements. The following table provides a clear breakdown of these monitor types along with their corresponding data source types, data types, and condition definitions.

Monitors are classified into three types based on the format of the data source they query and the data type they evaluate.

Monitor Type Data Source Type Data Type Condition definition
Equation Monitor Icebase, Prometheus, Postgres Numeric SQL Query, PROM Query
Report Monitor APIs String JQ Filtering and value matching
Stream Monitor Fastbase Topic String JQ Filtering and value matching

Report Monitor

This Monitor-type is suitable when the value you want to match is a string value. E.g. status and run time status of data os resource.

Equation Monitor

This Monitor-type is suitable when the metric/value you want to observe is in either in Icebase, Prometheus.

Stream Monitor

This Monitor-type is suitable when the value you want to match is stream. E.g. Fastbase topic.

How to create and manage a Monitor?

In DataOS, users have the capability to instantiate Monitor Resources by creating manifest files (YAML configuration files) and applying them via the DataOS CLI.

Create a Monitor Resource manifest

Monitor Resource YAML Monitor Resource YAML

The structure of a Monitor Resource manifest encompasses the following sections:

Resource Meta Section

The Resource meta section is a standardized component across all DataOS Resource manifests, detailing essential metadata attributes necessary for Resource identification, classification, and management. This metadata is organized within the Poros Database.

Below is a comprehensive table outlining the attributes encompassed within the Resource meta section:

Attribute         Data Type Default Value Possible Value Requirement
name string none
  • alpha numeric values with the RegEx
    [a-z0-9]([-a-z0-9]*[a-z0-9]); a hyphen/dash is allowed as a special character
  • total length of the string should be less than or equal to 48 characters
mandatory
version string none v1alpha, v1beta mandatory
type string none monitor mandatory
tags string none any string; special characters are allowed optional
description string none any string optional
owner string user-id of
the user who
applys the
Monitor
any valid dataos user id optional
layer string user user/system optional
monitor mapping none valid Monitor Resource-specific attributes mandatory

For further details on each attribute, refer to the provided links within the table. Additional insights into Resource-specific section attributes are accessible via the linked documentation: Attributes of Resource-specific section.

Monitor-specific Section

The Monitor-specific section of a manifest file, comprises attributes specific to the Monitor Resource. This section is subdivided into:

Schedule

The schedule attribute determines the frequency with which the Monitor Service queries the metric/event store to detect specified events or metrics. It utilizes a cron expression to specify this cadence.

A a typical declaration within the manifest file to configure the Monitor Service to perform its check every 4 minutes might look like this:

monitor:
  schedule: '* /4 * * * *'
  # ...other Monitor-specific attributes
Incident

The incident attribute within the Monitor-specific section details the data to be disseminated to a designated Fastbase topic upon detecting the specified event or metric within the system. This attribute is a mapping, allowing for the inclusion of various key-value pairs to afford users the flexibility to embed any pertinent information tailored to specific use cases.

The code block provided below shows a sample declaration:

monitor:
  incident:
    incident_type: field_profiling
    asset: output_1
    column: column_2
    severity: critical
    # ... other key-value pairs
  # ...other Monitor-specific attributes
Monitor Type

Monitors within the DataOS are categorized into three distinct types, differentiated by the data source format they interrogate and the data type they assess. This classification ensures that users can select the most appropriate monitor type for their specific requirements. The following table provides a clear breakdown of these monitor types along with their corresponding data source types, data types, and condition definitions.

Data Source Type Data Type Condition Definition Monitor Type Type Attribute Value                        
Icebase, Prometheus, Postgres Numeric SQL Query, PROM Query Equation Monitor type: equation_monitor
APIs String JQ Filtering and value matching Report Monitor type: report_monitor

Equation Monitor

This monitor type is suitable when the metric/event you want to observe is in either Icebase-type depot, or Prometheus. Apart from the attributes provided above following attributes are required to be declared within the Equation Monitor manifest file.

Equation Monitor specification

The YAML below shows a sample Equation Monitor manifest file.

# Resource meta section
name: ${{certificateexpirymonitornew}} # Resource name
version: v1alpha
type: monitor
tags:
  - ${{dataos:type:resource}} # Tags
description: ${{SSL certificate is about to expire less then 24 hrs}} # Resource description
layer: user
runAsUser: ${{iamgroot}} # User ID of User (or use case assignee)
monitor:

# Monitor-specific section
  schedule: ${{'*/2 * * * *'}} # Monitor schedule
  incident: # Incident
    name: ${{CertificateExpirydata}}
    severity: ${{high}}
    incidentType: ${{certificate_expiry}}

# Equation monitor specification
  type: equation_monitor # Monitor type
  equation: 
    # LHS
    leftExpression:
      queryCoefficient: ${{1}}
      queryConstant: ${{0}}
      query:
        type: ${{prom}}
        cluster: ${{thanos}}
        ql: ${{certmanager_certificate_expiration_timestamp_seconds{...} - time()}}
    # RHS
    rightExpression:
      queryConstant: ${{7766092}}
    # Operator
    operator: ${{less_than}}

equation

The equation attribute defines the condition or the criterion triggering an incident based on specific events or metrics. For example, a condition to generate an incident whenever the total expenditure exceeds a threshold value.

It is articulated as a mathematical equation within the manifest file and consists of the following components:

  • Left-Hand Side (LHS): The left-hand side (LHS) of the equation represents the current state or specific data the user intends to monitor. It is specified using the leftHandExpression attribute.
  • Right-Hand Side (RHS): The right-hand side (RHS) of the equation represents the benchmark or the threshold value for comparison. It is specified using the rightHandExpression attribute
  • Operator: The relationship between LHS and RHS is established through an operator attribute whose value signifies one of six possible conditions: equals (=), greater_than (>), less_than (<), greater_than_equals (β‰₯), less_than_equals (≀), or not_equals (β‰ ).
Condition as a Mathematical Equation

Values for LHS and RHS can be dynamic, sourced from queries, or static. A dynamic value may be adjusted by a coefficient, while a static value acts as a constant.

Example: Monitoring a Metric

Scenario: Triggering an alert when a customer's total expenditure surpasses 25% of their spending limit.

LHS Calculation: A query retrieves the total expenditure, applying a coefficient of 1 and a constant of 0.
RHS Calculation: A query determines the spending limit, using a coefficient of 0.25 (25%) and a constant of 0.

Operator: The condition employs a "greater than or equal to" (β‰₯) operator for evaluation.

The table below summarizes the various attributes in Equation Monitor manifest.

Attribute                    Data Type Default Value Possible Value Requirement
equation mapping none none mandatory
leftExpression mapping none none mandatory
rightExpression mapping none none mandatory
queryCoefficient number none any number mandatory
queryConstant number none any number mandatory
query mapping none valid query attributes mandatory
type string none prom, trino, postgres mandatory
cluster string none minerva, thanos, themis mandatory
operator string none equals, greater_than, less_than, not_equals, greater_than_equals, less_than_equals mandatory

For more information regarding the various attributes, refer to the link: Attributes of Monitor manifest.

Report Monitor

The Report Monitor is suitable for scenarios where the target metric or event is identified by string values, such as the status or runtime status of a DataOS Resource.

Report Monitor specification

Below is an example of a Report Monitor configuration defined in a YAML manifest:

# Resource meta section
name: ${{runtimemonitorcamelspec}}
version: v1alpha
type: monitor
tags:
  - ${{dataos:type:resource}}
  - ${{dataos:layer:user}}
description: ${{Attention! workflow run failed.}}
layer: user
monitor:

# Monitor-specific section
  runAsUser: ${{iamgroot}}
  schedule: ${{'*/2 * * * *'}}
  incident:
    name: workflowrunfailed
    severity: high
    incidentType: workflowruntimefailure
  type: report_monitor
# Report Monitor specification
  report:
    source:
      dataOsInstance:
         path: ${{/collated/api/v1/reports/resources/runtime?id=workflow:v1:snowflakescannerdepotis:public}}
    conditions:
    - valueComparison:
        observationType: ${{runtime}}
        valueJqFilter: ${{'.value'}}
        operator: ${{equals}}
        value: ${{running}}
    - valueComparison:
        observationType: ${{workflow-runs}}
        valueJqFilter: ${{'.value[] | select (.started | fromdateiso8601 > (now-113600)) | .phase'}}
        operator: ${{equals}}
        value: ${{succeeded}}

report

The report specifies the data source and conditions for evaluation, indicating the Monitor's reliance on report data.

source

The > attribute specifies the source of the report. Within the source mapping, the attribute incorporate the environment domain in which you are implementing the Monitor and prefix the value in the provided . This way you don't have to worry about specifying the environment name. The source attribute defines the report's origin. It includes a dataOsInstance attribute, which simplifies environment specification by incorporating the environment domain into the provided path.
monitor:
  report:
    source:
      dataOsInstance:
         path: ${{/collated/api/v1/reports/resources/runtime?id=workflow:v1:snowflakescannerdepotis:public}}

conditions

This attribute outlines the criteria for incident activation. Multiple conditions can be defined; an incident is triggered only if all conditions are met.

monitor:
  conditions: # mandatory
    - valueComparison:
        observationType: runtime # mandatory
        valueJqFilter: '.value' # mandatory
        operator: equals # mandatory
        value: running # mandatory
      durationComparison: 
        observationType: runtime # mandatory
        selectorJqFilter: # mandatory
        startedJqFilter: # mandatory
        completedJqFilter: # mandatory
        operator: # mandatory
        value: # mandatory
  • valueComparison: Specifies a condition based on value matching.
    • observationType: Defines the type of data under observation (e.g., "runtime", "workflow-runs").
    • valueJqFilter: A JQ filter expression for extracting specific values from data. For details on JQ filters, visit the following link.
    • operator: The comparison operator (e.g., "equals") used to compare extracted and expected values.
    • value: The reference value for comparison, determining the condition's truth.

The table below summarizes the various attributes in Resource Monitor manifest.

Attribute                    Data Type Default Value Possible Value Requirement
report mapping none none optional
source mapping none none mandatory
dataOsInstance mapping none none mandatory
path string none any valid API endpoint paths mandatory
conditions list of mappings none none mandatory
valueComparison mapping none valid query attributes mandatory
durationComparison mapping none valid query attributes mandatory
operator string none equals, greater_than, less_than, not_equals, greater_than_equals, less_than_equals mandatory
value string none any valid string mandatory

For more information regarding the various attributes, refer to the link: Attributes of Report manifest.

Apply the Monitor manifest through CLI

After creating the Monitor manifest file, it's time to apply it to instantiate the Resource-instance in the DataOS environment. To apply the Monitor manifest file, utilize the resource apply command.

dataos-ctl resource apply -f ${{manifest-file-path}} -w ${{workspace-name}}

# Sample
dataos-ctl resource apply -f /home/Desktop/my-monitor.yaml -w curriculum

# Expected Output
INFO[0000] πŸ” apply...                                     
INFO[0001] πŸ” applying(curriculum) cpu-usage-spike:v1alpha:monitor...
INFO[0002] πŸ” applying(curriculum) cpu-usage-spike:v1alpha:monitor...created                                     
INFO[0003] πŸ” apply...complete                             

Verify Monitor Status

Use the below command to get all the existing monitors for all owners.

dataos-ctl resource get -t monitor -w ${{workspace name}} -a

dataos-ctl resource get -t monitor -w curriculum -a

# Expected Output
INFO[0000] πŸ” get...                                     
INFO[0000] πŸ” get...complete                             

       NAME     | VERSION |  TYPE   | WORKSPACE  | STATUS | RUNTIME | OWNER      
----------------|---------|---------|------------|--------|---------|----------------
    my-monitor  | v1alpha | monitor | curriculum | active |         | iamgroot 
    monitor101  | v1alpha | monitor | curriculum | active |         | thor

You can also access the details of any created Monitor through the DataOS GUI in theΒ Operations App.

Check for Incident Messages

  • To review incident messages dispatched by the Monitor, access the specified topic (persistent://public/default/monitor-incident-new) where these notifications are published. Execute the following command in the terminal:
dataos-ctl fastbase topic consume -p -s -t persistent://system/monitor/monitor-incident-new
Sample output
dataos-ctl fastbase topic read -p -t persistent://public/default/monitor-incident
INFO[0000] πŸ” read...                                    
INFO[0000] Connecting to broker                          remote_addr="pulsar+ssl://tcp.sunny-prawn.dataos.app:6651"
INFO[0000] TCP connection established                    local_addr="192.168.1.81:51405" remote_addr="pulsar+ssl://tcp.sunny-prawn.dataos.app:6651"
INFO[0001] Connection is ready                           local_addr="192.168.1.81:51405" remote_addr="pulsar+ssl://tcp.sunny-prawn.dataos.app:6651"
INFO[0003] Connecting to broker                          remote_addr="pulsar+ssl://tcp.sunny-prawn.dataos.app:6651"
INFO[0003] TCP connection established                    local_addr="192.168.1.81:51406" remote_addr="pulsar+ssl://tcp.sunny-prawn.dataos.app:6651"
INFO[0005] Connection is ready                           local_addr="192.168.1.81:51406" remote_addr="pulsar+ssl://tcp.sunny-prawn.dataos.app:6651"
INFO[0006] Connected consumer                            consumerID=1 name=54a144e1c26be58b2ed28354513c581f37419ffad71c5e7d964d524bfd8d7077 subscription=reader-zvyoa topic="persistent://public/default/monitor-incident"
INFO[0006] Created consumer                              consumerID=1 name=54a144e1c26be58b2ed28354513c581f37419ffad71c5e7d964d524bfd8d7077 subscription=reader-zvyoa topic="persistent://public/default/monitor-incident"
INFO[0006] Broker notification of Closed consumer: 1     local_addr="192.168.1.81:51406" remote_addr="pulsar+ssl://tcp.sunny-prawn.dataos.app:6651"
INFO[0006] Reconnecting to broker in 101.592638ms        consumerID=1 name=54a144e1c26be58b2ed28354513c581f37419ffad71c5e7d964d524bfd8d7077 subscription=reader-zvyoa topic="persistent://public/default/monitor-incident"
INFO[0006] Connected consumer                            consumerID=1 name=54a144e1c26be58b2ed28354513c581f37419ffad71c5e7d964d524bfd8d7077 subscription=reader-zvyoa topic="persistent://public/default/monitor-incident"
INFO[0006] Reconnected consumer to broker                consumerID=1 name=54a144e1c26be58b2ed28354513c581f37419ffad71c5e7d964d524bfd8d7077 subscription=reader-zvyoa topic="persistent://public/default/monitor-incident"
{"id":"CMAYEA0YACAA","string_id":"3136:13:0","payload":"eyJtb25pdG9yIjp7Im5hbWUiOiJ0YWJsZXJvd2NvdW50IiwiZGVzY3JpcHRpb24iOiJ0YWJsZSByb3cgY291bnQgdGhyZXNob2xkIiwicHJvcGVydGllcyI6e30sInNjaGVkdWxlIjoiMSAqLzQgKiA/ICogKiIsImluY2lkZW50Ijp7InR5cGUiOiJwdWxzYXIiLCJuYW1lIjoibW9uaXRvci1pbmNpZGVudCIsInN1bW1hcnkiOiJzb21lIHN1bW1hcnkiLCJjYXRlZ29yeSI6InRlc3QifSwiZXF1YXRpb24iOnsibGVmdF9leHByZXNzaW9uIjp7InF1ZXJ5X2NvZWZmaWNpZW50IjoxLCJxdWVyeV9jb25zdGFudCI6MCwicXVlcnkiOnsidHlwZSI6InRyaW5vIiwiY2x1c3RlciI6InRoYW5vcyIsInFsIjoic2VsZWN0IGNvdW50KCopIGZyb20gaWNlYmFzZS5yZXRhaWwuY2l0eTsifX0sInJpZ2h0X2V4cHJlc3Npb24iOnsicXVlcnlfY29lZmZpY2llbnQiOjEsInF1ZXJ5X2NvbnN0YW50IjoxMDAwMDAsInF1ZXJ5IjpudWxsfSwib3BlcmF0b3IiOiJncmVhdGVyX3RoYW4ifX0sImxlZnRfZXhwcmVzc2lvbl92YWx1ZSI6MTA2NzUwLCJyaWdodF9leHByZXNzaW9uX3ZhbHVlIjoxMDAwMDB9","publish_time":"2023-10-09T14:34:28.684+05:30","event_time":"1970-01-01T05:30:00+05:30","producer_name":"pulsar-52-31","topic":"persistent://public/default/monitor-incident"}
INFO[0008] no more messages to read...exiting           
INFO[0008] Closing consumer=1                            consumerID=1 name=54a144e1c26be58b2ed28354513c581f37419ffad71c5e7d964d524bfd8d7077 subscription=reader-zvyoa topic="persistent://public/default/monitor-incident"
INFO[0008] Closed consumer                               consumerID=1 name=54a144e1c26be58b2ed28354513c581f37419ffad71c5e7d964d524bfd8d7077 subscription=reader-zvyoa topic="persistent://public/default/monitor-incident"
INFO[0008] close consumer, exit reconnect                consumerID=1 name=54a144e1c26be58b2ed28354513c581f37419ffad71c5e7d964d524bfd8d7077 subscription=reader-zvyoa topic="persistent://public/default/monitor-incident"
INFO[0008] πŸ” read...complete
  • Save the entire message in a new JSON file and use the following command to decrypt it through base64 decoder.
dataos-ctl jq -f ${{json file path}} --filter '.payload' | base64 --decode | jq . 
Sample Output

Retrieving and Processing Incident Messages

Storing the JSON Message

Initially, save the JSON message by creating a new file. This is essential for data preservation for subsequent analysis. Example JSON structure:

{"id":"CMAYEA0YACAA","string_id":"3136:13:0","payload":"eyJtb25pdG9yIjp7Im5hbWUiOiJ0YWJsZXJvd2NvdW50IiwiZGVzY3JpcHRpb24iOiJ0YWJsZSByb3cgY291bnQgdGhyZXNob2xkIiwicHJvcGVydGllcyI6e30sInNjaGVkdWxlIjoiMSAqLzQgKiA/ICogKiIsImluY2lkZW50Ijp7InR5cGUiOiJwdWxzYXIiLCJuYW1lIjoibW9uaXRvci1pbmNpZGVudCIsInN1bW1hcnkiOiJzb21lIHN1bW1hcnkiLCJjYXRlZ29yeSI6InRlc3QifSwiZXF1YXRpb24iOnsibGVmdF9leHByZXNzaW9uIjp7InF1ZXJ5X2NvZWZmaWNpZW50IjoxLCJxdWVyeV9jb25zdGFudCI6MCwicXVlcnkiOnsidHlwZSI6InRyaW5vIiwiY2x1c3RlciI6InRoYW5vcyIsInFsIjoic2VsZWN0IGNvdW50KCopIGZyb20gaWNlYmFzZS5yZXRhaWwuY2l0eTsifX0sInJpZ2h0X2V4cHJlc3Npb24iOnsicXVlcnlfY29lZmZpY2llbnQiOjEsInF1ZXJ5X2NvbnN0YW50IjoxMDAwMDAsInF1ZXJ5IjpudWxsfSwib3BlcmF0b3IiOiJncmVhdGVyX3RoYW4ifX0sImxlZnRfZXhwcmVzc2lvbl92YWx1ZSI6MTA2NzUwLCJyaWdodF9leHByZXNzaW9uX3ZhbHVlIjoxMDAwMDB9","publish_time":"2023-10-09T14:34:28.684+05:30","event_time":"1970-01-01T05:30:00+05:30","producer_name":"pulsar-52-31","topic":"persistent://public/default/monitor-incident"}

Processing the Payload

For payload processing, utilize the jq command with a base64 decoder. The jq command facilitates JSON parsing and filtering:

  • Decoding and Formatting the Payload:
    dataos-ctl jq -f new.json --filter '.payload' | base64 --decode | jq '.'
  • Extracting and Validating Specific Data:
    dataos-ctl jq -f new.json --filter '.payload' | base64 --decode | jq -r '.monitor.equation.left_expression.query.type' | grep -q 'trino' && echo "OK"

Sample Processed Output

Below is the structured JSON after processing. It concisely presents the monitor's details, including name, description, scheduling, incident configuration, and the condition-triggering equation:

{
  "monitor": {
    "name": "tablerowcount",
    "description": "table row count threshold",
    "properties": {},
    "schedule": "1 */4 * ? * *",
    "incident": {
      "type": "pulsar",
      "name": "monitor-incident",
      "summary": "some summary",
      "category": "test"
    },
    "equation": {
      "left_expression": {
        "query_coefficient": 1,
        "query_constant": 0,
        "query": {
          "type": "trino",
          "cluster": "thanos",
          "ql": "select count(*) from icebase.retail.city;"
        }
      },
      "right_expression": {
        "query_coefficient": 1,
        "query_constant": 100000,
        "query": null
      },
      "operator": "greater_than"
    }
  },
  "left_expression_value": 106750,
  "right_expression_value": 100000
}
The last two key-value pairs are of particular importance here, which return the threshold (left_expression_value) and observed value (right_expression_value).

This step ensures that your monitor is publishing incidents. Another way is to check is by creating a Pager and receiving alerts.

Get alert using Pager

After an incident is published, users can set up alerts based on these incidents. This is achieved by creating Pager Resource instances, which allow for targeted notifications. This process ensures timely responses to incidents identified by the Monitor.

Updating a Monitor

If you have to make changes to an existing Monitor which you have previously created. Make changes to the configuration file of the Monitor Resource, and either use resource apply or use resource update. In both cases the Monitor gets updated:

dataos-ctl resource update -t monitor -w curriculum -n my-monitor

Get details of a specific Monitor

To retrieve the status of a specific Monitor including details in the result, use the following command:

dataos-ctl resource get -t monitor -w curriculum -n ${{monitor name}} -d

Get runtime status of a Monitor

To retrieve the runtime status of a Monitor, use the command:

dataos-ctl get runtime -i 'monitorname | version | resourcetype | workspace'
e.g.,
dataos-ctl get runtime -i 'runtimedatatoolmonitor | v1alpha | monitor | public'

Example Responses

  • Response for Step 1 (Monitor Details):

    NAME                     | VERSION | TYPE    | WORKSPACE | STATUS | RUNTIME                  | OWNER
    -------------------------|---------|---------|-----------|--------|--------------------------|--------------
    runtimedatatoolmonitor   | v1alpha | monitor | public    | active | next:2024-03-12T14:22:00 | iamgroot
    

  • Response for Step 2 (Runtime Status):

    RUN ID       | STARTED                   | FINISHED                  | RUN STATUS | RESULT
    -------------|---------------------------|---------------------------|------------|----------------------------------------------------------------------------------------
    dff8x9an6xvk | 2024-03-12T14:20:00+05:30 | 2024-03-12T14:20:00+05:30 | completed  | 🟩 monitor condition met for monitor: 'runtimedatatoolmonitor_public', created incident 'dff8x9rrzv9c'
    

Deleting a Monitor

Before deleting a Monitor, you must delete all Resources that are dependent on it like Pagers. This step ensures that there are no dependencies left that could cause issues during deletion. Once it's done, use the resource delete command to remove the specific Bundle Resource-instance from the DataOS environment:

# METHOD 1
dataos-ctl resource delete -t monitor -w ${{workspace-name}} -n ${{name-of-monitor}}
# Sample
dataos-ctl resource delete -t monitor -w curriculum -n my-monitor

# METHOD 2
dataos-ctl resource delete -i "${{identifier string for a Resource: format NAME:VERSION:TYPE:WORKSPACE(optional:default-public)}}"
# Sample 
dataos-ctl resource delete -i "my-monitor | v1alpha | monitor |  curriculum    "

Monitor Errors

  • When applying a monitor configuration with a name that does not comply with the required pattern, an error is returned. Monitor names must adhere to a specific regex pattern and be within a character limit.

    dataos-ctl resource apply -f monitor/accuracy_monitor.yml -l
    INFO[0000] πŸ›  apply...                                   
    INFO[0000] πŸ”§ applying(public) failed_row-accuracy-monitor:v1alpha:monitor... 
    WARN[0001] πŸ”§ applying(public) failed_row-accuracy-monitor:v1alpha:monitor...error 
    ⚠️ Invalid Parameter - failure validating resource : name is invalid 'failed_row-accuracy-monitor', must be less than '48' chars and conform to the following regex: '[a-z0-9]([-a-z0-9]*[a-z0-9])?'
    WARN[0001] πŸ›  apply...error                              
    ERRO[0001] failure applying resources 
    
  • The owner field in the manifest file must match the user ID for automatic information population in Metis. Specifying an incorrect ID or a different string results in the resource get command failing to retrieve the Resource. To ensure successful retrieval, either correctly specify the user ID in the owner field or omit it for automatic assignment.

  • The monitor name schema_monitor is reserved and cannot be used for monitor configurations.

  • An Invalid Parameter error occurs when specified schemas or tables in the query do not exist in the targeted Database/Cluster.

    status:
      aggregateStatus: error
      cloudKernelResources:
        - name: testmonitor3-tags-monitor-pe8z
          namespace: curriculum
          version: v1
          kind: Secret
          resource: secrets
          dataplane: hub
          status: created
      webServiceResources:
        - id: testmonitor3_curriculum
          service: monitor
          type: monitor
          self: /monitors/testmonitor3_curriculum
          status: error
          error: Invalid Parameter
      builderState:
        stage: building
        numberOfWantedResources: "2"
        numberOfProcessedResources: "2"
        info: builder encountered an error
    

Monitor recipes