Skip to content

Elements of Lens

Tables and Views
Tables and Views

The data model in Lens transforms raw data into meaningful business definitions, optimizing it for analytical purposes. This model is exposed through a robust set of APIs, enabling end-users to run diverse analytical queries without modifying the underlying data model. Let’s take a sales table as an example, containing the following columns:

sales Table

order_id product amount city customer_name
1 Shoes 150 New York John Doe
2 Jacket 200 San Francisco Jane Smith
3 Hat 50 Los Angeles Michael Brown
4 Shoes 120 New York Emily Davis
5 Jacket 220 Chicago Sarah Wilson

To analyze sales data effectively, it is essential to address key business questions, such as:

  • What is the total sales amount?

  • How many orders were placed?

  • What is the average sales amount?

  • How do sales vary across different cities and customers?

Rather than crafting individual SQL queries for each question, the Lens data model provides a robust framework for efficiently organizing and analyzing data to answer business questions. By modeling entities and their relationships, it allows for a structured approach to data analysis. For instance, Lens enables the modeling of entities like sales, orders, and products, ensuring their interactions are clearly defined.

To effectively leverage Lens for data modeling, several core concepts are crucial. These concepts help convert raw data into actionable business insights, facilitating the extraction of meaningful results. The following sections outline the foundational components of the Lens data modeling approach, designed to optimize your analytical workflows.

Table

A table is a logical construct used to define a real-world entity such as customer, product. It contains information about joins (relationships), dimensions, measures, and segments. Each table is defined in a separate YAML file.

We'll use a sample database with two tables, owner and contacts to illustrate the concepts.

owner Table

owner_id owner_name owner_email city
1 Alice Johnson alice@example.com New York
2 Bob Smith bob@example.com San Francisco
3 Charlie Brown charlie@example.com Los Angeles
4 David Williams david@example.com Chicago
5 Emma Davis emma@example.com Miami

contacts Table

contact_id owner_id contact_name contact_email phone_number
1 1 John Doe john.doe@example.com 123-456-7890
2 2 Jane Smith jane.smith@example.com 987-654-3210
3 3 Michael Brown michael.b@example.com 555-123-4567
4 4 Sarah Johnson sarah.j@example.com 444-222-3333
5 5 Robert Wilson robert.w@example.com 777-888-9999

Example

In the example below the owner the table is defined for a sales analytics Lens:

tables:
  - name: owner
    sql: {{ load_sql('owner') }}
    description: Table containing information about the owners.
    data_source: icebase
    public: true   

You can also use the tables attribute to accommodate more complex SQL queries:

tables:
  - name: owner
    sql: >
      SELECT *
      FROM owner_id, email
      WHERE owner.owner_id = contacts.owner_id

Within each table, dimensions, measures, and segments are defined. Joins are used to define relations between tables.

Attribute

The table declaration involves the following attributes:

Property Description Possible Value Best Practice
name Specify the name of the table:
- Should start with a letter only
- Can only contain letters, numbers, and underscores
^[a-zA-Z][a-zA-Z0-9_]*$ Use snake_case. For example: sales_insight
sql Add a reference to the SQL file to map data to the table. SQL mapping is maintained in a separate file NA Keep SQL clearly named for easy maintenance.
description Description of the table's purpose NA Provide a concise yet informative description to aid understanding
public Controls visibility of the table, i.e., whether the table is visible to all users or hidden True, False Use True for tables that should be widely accessible and False for those that are sensitive or irrelevant to most users​
joins Define the relationship with other tables. Left joins are created. The table within which the join is defined is on the left. NA Use joins judiciously to avoid transitive join issues
dimensions Dimensions associated with the table for slicing and dicing data NA Define dimensions that are frequently used in filtering and grouping data​
measures Aggregation over a dimension NA Focus on measures that provide key business insights.
segments An array of segments can be defined. Segments are to filter data based on specific criteria NA Create segments that represent filter groups used in common business queries to enhance usability​

Joins

The joins attribute is used to define relationships between multiple tables(entities), enabling users to access and compare fields from two or more entity tables at the same time.

Key characteristics of Joins:

  • Join Type: All joins are generated as LEFT JOIN. The entity that defines the join serves as the main table, while any entity referenced within the joins property is incorporated in the LEFT JOIN clause.

  • INNER JOIN Semantics: INNER JOIN behavior can be achieved by applying additional filters, such as checking if a column value is NOT NULL. For instance, using a filter not null ensures this requirement is met.

  • Unsupported Join Types:

  • RIGHT OUTER JOIN: To achieve RIGHT OUTER JOIN semantics, define the join from the other side of the relationship.

  • FULL OUTER JOIN: For simplicity in join modeling, full outer joins cannot be directly declared. However, they can be constructed using the SQL parameter within a Lens. These are often used for data blending or similar use cases. In such scenarios, creating a dedicated Lens specifically for the operation is recommended.

There are three types of join relationships (one_to_one, one_to_many, and many_to_one) and a few other concepts.

  • one_to_one:

  • This relationship means that each row in the declaring table corresponds to exactly one row in the joined table.

  • one_to_many:

  • This indicates that one row in the declaring table can be linked to multiple rows in the joined table.

  • many_to_one:

  • This is the reverse of a one_to_many relationship. In this case, multiple rows in the declaring table can match a single row in the joined table.

Attributes

Property Description
name Candidate target table for joining with the base table
relationship Type of join relationship with other tables - one-to-one, one-to-many,many-to-one
sql Join clause Syntax: base_table.key = target_table.key

Example:

In this example, we define the base table and outline its relationship with another table. Specifically, multiple transactions can reference the same product, establishing a many-to-one relationship between the transactions table and the products table. In this context, the placeholder {TABLE} refers to the base table, which is the transactions table.

owner.yml
tables:
  - name: owner
    sql: {{ load_sql('owner') }}
    description: Table containing information about owner records.
    joins:
      - name: contacts
        relationship: one_to_many
        sql: "{TABLE.owner_id}= {contacts.owner_id}"   

Dimensions

Dimensions represent the properties of a single data point in the table. They are descriptive attributes, often text values, that provide context to the data in a fact table. For example, in a sales database, attributes like product_name, category, and region, are called dimensions of that table which help to categorize and provide context for the sales table. The dimensions serve as the guidelines for data analysts to view the information from different angles and apply different filtering criteria.

For example, in the following YAML code snippet, the owner table is defined, with unique attributes such as id and email categorized as dimensions.

owner.yml
tables:

  - name: owner
    sql: {{ load_sql('owner') }}
    description: Table containing information about owner records.

    dimensions:
      - name: owner_id
        type: string
        description: Unique identifier for each owner.
        sql: owner_id
        primary_key: true
        public: true

      - name: email
        type: string
        description: Email address of the owner.
        sql: email

The dimension declaration involves the following properties:

Property Description Possible Value Best Practice
name Unique identifier of the dimension ^[a-zA-Z][a-zA-Z0-9_]*$ Use snake_case.
For example: order_date
title Human-readable title of the dimension. Use ‘title’ to change the display name String Provide a clear and concise title for better readability and user understanding
description Description of the dimension's purpose String Provide a detailed description to explain the dimension's business value
column References the column defined in the table’s SQL. While custom SQL can be defined in a table’s dimension, best practice suggests that it should be defined in the table’s SQL.
public Controls visibility of dimension, i.e. whether the dimension is visible to all users or hidden. If not mentioned explicitly, by default this property is true True, False Set to True for key dimensions that should be visible by default
primary_key The key on which the join relationship will be defined Column name Ensure each dimension has a unique primary key to maintain data integrity
type The data type of the dimension string, number, time, boolean Choose the appropriate data type to ensure proper sorting and filtering​
meta  Ensures sensitive data is masked for specified user groups. Key-value pairs Use metadata to provide additional context about the dimension, such as tags, or custom attributes
sub_query Sub-query for the dimension. Set the flag to reference a measure of one table in dimension of another Use to define complex dimensions using sub-queries

Measures

Measures are quantifications, representing fields such as order subtotal, quantity of items purchased, or duration spent on a specific page. They encompass any measurable numerical value on which statistical calculations can be applied, including sum, average, and variance. For instance, with a measure such as the quantity of items purchased, operations can include calculating the average quantity ordered, sorting by descending quantities, summing all quantities, and performing various statistical analyses. For instance, consider the following YAML definition:

tables:
  - name: sales
    sql: {{ load_sql('sales') }}
    description: Table containing sales records with order details.

    measures:
      - name: total_orders_count
        type: count
        sql: id
        description: Total amount of orders.
    #....

This snippet defines a table named sales, representing sales records with order details. Within the measures block, a measure called total_orders_count is created. This measure, of type count, uses the id column to calculate the total number of orders.

When this measure is requested via an API, Lens will generate and execute SQL similar to the following:

SELECT COUNT(order_id) AS total_orders_count
FROM sales;

When a city dimension is applied to the measure to answer the question "Where are customers based?", Lens generates SQL with a GROUP BY clause, such as:

SELECT city, COUNT(id) AS count
FROM users
GROUP BY 2;

You can include multiple dimensions in your query to perform grouping as needed.

Calculated Measures

To answer the question “What is the average sales amount per order?”, you’ll need to calculate the average amount for all orders. This can be done by defining a measure that calculates the average of the amount field. Here’s how you can define this in YAML for your sales table:

measures:
 - name: total_sales_amount
   sql: SUM(amount)
   type: number
   description: Total sales amount across all orders.

 - name: total_order_count
   sql: COUNT(order_id)
   type: count
   description: Total number of orders.

 - name: average_sales_amount_per_order
   sql: "{total_sales_amount} / NULLIF({total_order_count}, 0)"
   type: number
   format: currency
   description: The average sales amount per order.

# .....

Here we defined a calculated measure average_sales_amount_per_order, which divides the total sales amount by the total number of orders. The NULLIF function prevents division by zero in case there are no orders.

This example shows how you can reference measures inside other measure definitions. When you request the average_sales_amount_per_order measure via an API, the following SQL will be generated:

SELECT
  SUM(amount) / NULLIF(COUNT(order_id), 0) AS average_sales_amount_per_order
FROM
  sales

As with other measures, average_sales_amount_per_order can be used with dimensions.

Measure Additivity

Additivity is a property of measures that tells us if we can break down or combine measure values across different categories. In simpler terms, it means that if we have a measure for a group of dimensions (like sales for different regions), we can add or combine these values to get a measure for a smaller subset of those dimensions (like sales for a specific region).

The additivity of a measure depends on its type. Only measures with the following types are considered additive: count, count_distinct_approx, min, max, and sum. Measures with all other types are considered non-additive.

The measure declaration involves the following properties:

Property Description Possible Value Best Practice Usage
name Unique identifier of the measure ^[a-zA-Z][a-zA-Z0-9_]*$ Use snake_case. For example: total_revenue
sql SQL expression to define the measure In SQL, custom SQL expressions can be defined.
- Useful for specifying formulas.
- Helps calculate other measures in SQL.
title Human-readable title of the measure String Provide a clear and concise title for better readability and user understanding
type The data type of the measure time, string, number, boolean, count, sum, count_distinct, count_distinct_approx, avg, min, max Choose the appropriate type to match the measure's calculation method
description Description of the measure's purpose String Highlight Purpose and Usage
public Controls visibility of the measure True, False Set to True for key measures that should be visible by default
filters Filters applied to the measure SQL conditions To aggregate a measure for a specific category, apply a filter.
For example, apply a filter for product category.
Syntax: {table}.{dimension_name} = 'dimension_value'
e.g. sales.product_category = 'Electronics'
meta Custom metadata
tables:
  - name: owner
    sql: {{ load_sql('owner') }}
    description: Table containing information about owners.

    measures:
      - name: count
        sql: customer_id
        type: count_distinct
        description: Total number of owners.

Segments

Segments are pre-defined groups of filters.

The segment declaration involves the following properties:

Property Description Possible Value
name Specify the name of the dimension. NA
public Controls visibility of dimension, i.e. whether the dimension is visible to all users or hidden True, False
sql Add filter criteria: table.= “dimension_values” NA
meta Custom metadata. This is also used to define secure sub property NA

In the below example, we define a filter for active owners. Similarly, we can create a filter for non-active owners to differentiate.

tables:
  - name: owners
    # ...

    segments:
      - name: active_owners
        sql: "{TABLE}.status = 'active'"

Since the segments attribute type is an array, you can apply as many segments as required.

segments:
  - name: sanfrancisco_sales
    sql: "{TABLE}.city = 'San Francisco'"

  - name: losangeles_sales
    sql: "{TABLE}.city = 'Los Angeles'"

  - name: chicago_sales
    sql: "{TABLE}.city = 'Chicago'"

 #....

A more thorough introduction can be found in Working with Segments.

Views

Views sit on top of the data graph of Lens and create an abstraction of the whole data model with which data consumers can interact. They serve as a layer for defining metrics, providing a simplified interface for end-users to interact objectively with key metrics instead of the entire data model. A view reference serves as a way to access dimensions, measures, and segments from multiple logical tables. It does not define any measures, dimensions, or segments on its own.

The view declaration involves the following properties:

Property Description Possible Values Best Practice
name Unique identifier for the view. ^[a-zA-Z][a-zA-Z0-9_]*$ Use snake_case.
For example: owner_contact_analysis
description Brief explanation of the view's purpose and the data it represents. String Provide a clear and meaningful description for better understanding.
public Defines the visibility of the view—whether it is accessible to all users or restricted. True, False Set to True for widely used views, and False for internal or restricted views.
tables List of tables included in the view and their associated configurations. List Ensure all necessary tables are included for a complete dataset.
join_path Specifies the table(s) from which data is pulled for the view. Table name Define a logical join path to maintain referential integrity.
prefix Indicates whether column names from the table should be prefixed with the table name to prevent conflicts. True, False Use True when including multiple tables with similar column names.
includes Lists specific columns from the table to be included in the view. Column names If a measure is calculated using a dimension, ensure that the corresponding dimension is also included in the view.

In the example below, we create the transaction_analysis view which includes selected members from transactions, and products tables:

views:
  - name: owner_contact_analysis
    description: View containing detailed information about owners and their associated contacts
    public: true

    tables:
      - join_path: owners
        prefix: true
        includes:
          - owner_id
          - owner_name
          - owner_status
          - owner_value
          - owner_city

      - join_path: contacts
        prefix: true
        includes:
          - contact_id
          - contact_name
          - contact_email
          - contact_phone
          - contact_owner_id

A more thorough introduction can be found in Working with views.

Was this page helpful?