Skip to content

Freshness checks

Ensuring data freshness is essential for maintaining the reliability and accuracy of datasets. In Soda, freshness checks can be defined to monitor and enforce data timeliness. Below are explanations and sample configurations for various types of freshness checks.

Basic freshness check: Ensures that the dataset is not older than a specified duration.

In this example, the freshness(test) < 10d condition verifies that the dataset's most recent record is less than 10 days old.

- freshness(test) < 10d:
    name: Data should not be older than 10 days
    attributes:
      category: Freshness
      title: Validate dataset recency within 10 days

Example with check name: Ensures that data freshness meets a defined threshold and includes a descriptive check name for better traceability.

The following check verifies that the most recent record in the dataset is less than 27 hours old.

- freshness(start_date) < 27h:
    name: Data is fresh
    attributes:
      category: Freshness
      title: Validate that data is updated within the last 27 hours

Example with alert configuration: Defines warning and failure thresholds for freshness checks using alert conditions.

The only comparison symbol that can be used with freshness checks that employ an alert configuration is >.

- freshness(start_date):
    warn: when > 3256d
    fail: when > 3258d
    attributes:
      category: Freshness
      title: Monitor and alert when data exceeds freshness limits

# OR


- freshness(start_date):
    warn:
      when > 3256d
    fail:
      when > 3258d
    attributes:
      category: Freshness
      title: Trigger alerts when data freshness thresholds are exceeded

Example with in-check filter: Ensures that data freshness is validated only for specific subsets of data using a filter condition.

In this example, the check verifies that the records where weight = 10 are not older than 27 hours.

- freshness(start_date) < 27h:
    filter: weight = 10
    attributes:
      category: Freshness
      title: Validate data freshness for records with weight equal to 10

List of freshness thresholds

Threshold Example Reads as
#d 3d 3 days
#h 1h 1 hour
#m 30m 30 minutes
#d#h 1d6h 1 day and 6 hours
#h#m 1h30m 1 hour and 30 minutes

By incorporating these accuracy checks into your workflows, you can proactively monitor and maintain the correctness of your datasets, ensuring they meet your organization's data quality standards.