Skip to content

Core Concepts

Terms

Metric

A quantifiable occurrence used to track and assess the status or performance of specific aspects of a system, process, or business function. These are objective, numerical data points that are collected and analyzed to identify trends, patterns, or deviations from expected norms. For example,

  • Average CPU Utilization: Average percentage of CPU usage over a day/month/year.
  • Customer Churn Rate: Percentage of customers lost in the past year.

Event

A qualifiable occurrence within a system or process that reflects a specific change, action, anomaly, etc. Unlike metrics, which are quantitative measures, events are discrete and qualitative. They can range from system-generated alerts, such as errors or status changes, to user-initiated actions, like deployments or configuration changes. For example,

  • Data Pipeline Failure: A data processing pipeline encounters an error, halting data flow.
  • Schema Change in Data Source: A modification is detected in the schema of a source database.

Condition

A user-defined criterion or a set of criteria that must be satisfied by a metric or event for an incident to be generated. For example,

  • Average CPU Utilization (metric) exceeds 85% (condition).
  • IoT device reports (event) connectivity 'loss' (condition).

Incident

Output generated by the Monitor when a predefined condition, based on metrics or events, is met. For example,

  • Output generated when either the Average CPU Utilization (metric) exceeds the predefined condition or an Error 500 occurs (event), indicating a performance issue or critical failure, respectively.

Architecture

The Monitor Service has two major components - Monitor DB and Scheduler. Monitor DB stores the manifest file of each instance of Monitor in its database. The Scheduler reads from this manifest file to query the Metric & Event store. This Metric & Event store is a proxy for any storage system, such as an SQL database, Prometheus DB, a streaming DB or a queryable source of choice.

Monitor Service is responsible for cross-checking the condition declared by the user with an Event/Metric generated in the system. If the condition is fulfilled, it publishes the information mentioned in the manifest file as an incident to a Fastbase topic. This Fastbase topic is a store based on the Pulsar DB system.

working_of_a_monitor_service
Working of a Monitor Service