Themis¶
Connecting to Themis using Depot/Cluster¶
Prerequisite¶
Ensure you have an active and running Themis Cluster.
Step 1: Prepare the Lens model folder¶
Organize the Lens model folder with the following structure to define tables, views, and governance policies:
model
├── sqls
│ └── sample.sql # SQL script for table dimensions
├── tables
│ └── sample_table.yml # Logical table definition (joins, dimensions, measures, segments)
├── views
│ └── sample_view.yml # Logical views referencing tables
└── user_groups.yml # User group policies for governance
Load data from the data source¶
In the sqls
folder, create .sql
files for each logical table, where each file is responsible for loading or selecting the relevant data from the source. Ensure that only the necessary columns are extracted, and the SQL dialect is specific to the data source.
For example, a simple data load from Postgres
source might look as follows:
SELECT
customer_id as p_customer_id,
cast(purchase_date as timestamp) as purchase_date,
recency as recency_in_days,
mntwines,
mntmeatproducts,
mntfishproducts,
mntsweetproducts,
mntgoldprods,
mntfruits,
numdealspurchases,
numwebpurchases,
numcatalogpurchases,
numstorepurchases,
numwebvisitsmonth,
numwebpurchases + numcatalogpurchases + numstorepurchases + numstorepurchases as purchases,
(mntwines+mntmeatproducts+mntfishproducts+mntsweetproducts+mntgoldprods+mntfruits) as spend
FROM
postgres.public.purchase_data
Similarly, to load the data from another source lakehouse
the sql looks as follows:
SELECT
customer_id,
birth_year,
education,
marital_status,
income,
country
FROM
lakehouse.customer_relationship_management.customer_data
Define the table in the model¶
Create a tables
folder to store logical table definitions, with each table defined in a separate YAML file outlining its dimensions, measures, and segments. For example, to define a table for cutomer
data:
tables:
- name: customer
sql: {{ load_sql('customer') }}
description: "This table stores key details about the customers, including their personal information, income, and classification into different risk segments. It serves as the central reference point for customer-related insights."
public: true
Add dimensions and measures¶
After defining the base table, add the necessary dimensions and measures. For example, to create a table for sales data with measures and dimensions, the YAML definition could look as follows:
tables:
- name: customer
sql: {{ load_sql('customer') }}
description: "This table stores key details about the customers, including their personal information, income, and classification into different risk segments. It serves as the central reference point for customer-related insights."
public: true
joins:
- name: purchase
relationship: one_to_many
sql: "{TABLE.customer_id}= {purchase.p_customer_id}"
dimensions:
- name: customer_id
type: number
column: customer_id
description: "The unique identifier for each customer."
primary_key: true
public: true
#...
measures:
- name: total_customers
sql: "COUNT( Distinct {customer_id})"
type: number
description: "The total number of customers in the dataset, used as a basic measure of customer volume."
Add segments to filter¶
Segments are filters that allow for the application of specific conditions to refine the data analysis. By defining segments, you can focus on particular subsets of data, ensuring that only the relevant records are included in your analysis. For example, to filter for records where the state is either Illinois or Ohio, you can define a segment as follows:
To know more about segments click here.Create the views¶
Create a views
folder to store all logical views, with each view defined in a separate YAML file (e.g., sample_view.yml
). Each view references dimensions, measures, and segments from multiple logical tables. For instance the followingtotal_spend
view is created.
views:
- name: total_spending
description: This metric measures how many marketing campaigns a customer has engaged with.
public: true
meta:
title: Customer Spending by Product Category
tags:
- DPDomain.Marketing
- DPUsecase.Customer Segmentation
- DPUsecase.Product Recommendation
- DPTier.Consumer Aligned
metric:
expression: "*/45 * * * *"
timezone: "UTC"
window: "day"
excludes:
- spend
tables:
- join_path: purchase
prefix: true
includes:
- purchase_date
- customer_id
- total_spend
- spend
- join_path: product
prefix: true
includes:
- product_category
- product_name
To know more about the views click here.
Create user groups¶
This YAML manifest file is used to manage access levels for the semantic model. It defines user groups that organize users based on their access privileges. In this file, you can create multiple groups and assign different users to each group, allowing you to control access to the model.By default, there is a 'default' user group in the YAML file that includes all users.
Multiple user groups can be created in the user_groups.yml
. To know more about the user groups click here.
Create Lens manifest file¶
After preparing the Lens semantic model create a lens_deployemnt.yml
parallel to the model
folder.
version: v1alpha
name: "themis-lens"
layer: user
type: lens
tags:
- lens
description: themis lens deployment on lens2
lens:
compute: runnable-default
secrets:
- name: bitbucket-cred
allKeys: true
source:
type: themis #minerva/themis/depot
name: lenstestingthemis
catalog: icebase
repo:
url: https://bitbucket.org/tmdc/sample
lensBaseDir: sample/lens/source/themis/model
# secretId: lens2_bitbucket_r
syncFlags:
- --ref=main #repo-name
The YAML manifest provided is designed for a cluster named themiscluster
, created on the Themis
source, with a data catalog named icebase
. To utilize this manifest, duplicate the file and update the source details as needed.
Each section of the YAML template outlines essential elements of the Lens deployment. Below is a detailed breakdown of its components:
-
Defining the Source:
-
type
: Thetype
attribute in thesource
section must be explicitly set tothemis
. -
name
: Thename
attribute in thesource
section should specify the name of the Themis Cluster. For example, if the name of your Themis Cluster isclthemis
the Source name would beclthemis
. -
catalog
: Thecatalog
attribute must define the specific catalog name within the Themis Cluster that you intend to use. For instance, if the catalog is namedlakehouse_retail
, ensure this is accurately reflected in the catalog field.
-
-
Defining Repository:
-
url
Theurl
attribute in the repo section specifies the Git repository where the Lens model files are stored. For instance, if your repo name is lensTutorial then the repourl
will be https://bitbucket.org/tmdc/lensTutorial -
lensBaseDir
: ThelensBaseDir
attribute refers to the directory in the repository containing the Lens model. Example:sample/lens/source/depot/awsredshift/model
. -
secretId
: ThesecretId
attribute is used to access private repositories (e.g., Bitbucket, GitHub). It specifies the secret needed to authenticate and access the repository securely. -
syncFlags
: Specifies additional flags to control repository synchronization. Example:--ref=dev
specifies that the Lens model resides in thedev
branch.
-
-
Configure API, Worker, and Metric Settings (Optional): Set up replicas, logging levels, and resource allocations for APIs, workers, routers, and other components.
The above manifest is intended for a cluster named lenstestingthemis
, created on the themis source, with the Depot or data catalog named icebase
. To use this manifest, copy the file and update the source details accordingly.
Step 4: Apply the Lens manifest file¶
After configuring the deployment file with the necessary settings and specifications, apply the manifest using the following command: