MinervaΒΆ
To create a semantic model across multiple data sources using the Minerva Cluster, several prerequisite steps must be completed. These steps ensure proper setup and secure connections between the data sources and the Minerva Cluster.
PrerequisitesΒΆ
The following prerequisites are required:
Create Instance-Secret for data source connectionΒΆ
Before establishing a connection to the data source, an Instance Secret must be created. This secret securely stores the credentials required for read
(r
) and read write
(rw
) access to the data source.
For instance, to create the Depot for the source Postgres, create the Instance Secret as following:
name: postgres-r
version: v1
type: instance-secret
description: read and write ds appysecret for User postgres db
layer: user
instance-secret:
type: key-value-properties
acl: r
data:
username:
password:
name: postgres-rw
version: v1
type: instance-secret
description: read and write ds appysecret for User postgres db
layer: user
instance-secret:
type: key-value-properties
acl: rw
data:
username:
password:
Create Postgres Depot manifest fileΒΆ
Create Postgres Depot to set up the connection with the Postgres data source.
name: postgres
version: v2alpha
type: depot
layer: user
depot:
type: jdbc
description: default postgres depot
external: true
secrets:
- name: postgres-rw
keys:
- postgres-rw
allkeys: true
- name: postgres-r
keys:
- postgres-r
allkeys: true
jdbc:
subprotocol: postgresql
host: <host_address> #replace host address with actual host address
port: 5432
database: postgres
Create the Scanner manifest fileΒΆ
Create a scanner to scan the available dataset in the Postgres Depot and register it in the Metis.
version: v1
name: wf-postgres-depot
type: workflow
tags:
- postgres-depot-scan
description: The job scans schema tables and register data to metis
workflow:
dag:
- name: postgres-depot
description: The job scans schema from postgres depot tables and register data to metis
spec:
tags:
- scanner
stack: scanner:2.0
compute: runnable-default
stackSpec:
depot: dataos://postgres
Create Minerva Cluster manifest fileΒΆ
name: minervacluster
version: v1
type: cluster
tags:
- dataos:type:resource
- dataos:resource:cluster
- dataos:layer:user
- dataos:workspace:public
owner: iamgroot
workspace: public
cluster:
compute: query-default
type: minerva
minerva:
replicas: 1
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 500m
memory: 1Gi
depots:
- address: dataos://postgres?acl=r #postgres depot created
- address: dataos://lakehouse?acl=r #another depot
debug:
logLevel: INFO
trinoLogLevel: ERROR
Prepare the semantic model folderΒΆ
Organize the semantic model folder with the following structure to define tables, views, and governance policies:
model
βββ sqls
β βββ sample.sql # SQL script for table dimensions
βββ tables
β βββ sample_table.yml # Logical table definition (joins, dimensions, measures, segments)
βββ views
β βββ sample_view.yml # Logical views referencing tables
βββ user_groups.yml # User group policies for governance
Load data from the data sourceΒΆ
In the sqls
folder, create .sql
files for each logical table, where each file is responsible for loading or selecting the relevant data from the source. Ensure that only the necessary columns are extracted, and the SQL dialect is specific to the data source.
For example, a simple data load from Postgres
source might look as follows:
SELECT
customer_id as p_customer_id,
cast(purchase_date as timestamp) as purchase_date,
recency as recency_in_days,
mntwines,
mntmeatproducts,
mntfishproducts,
mntsweetproducts,
mntgoldprods,
mntfruits,
numdealspurchases,
numwebpurchases,
numcatalogpurchases,
numstorepurchases,
numwebvisitsmonth,
numwebpurchases + numcatalogpurchases + numstorepurchases + numstorepurchases as purchases,
(mntwines+mntmeatproducts+mntfishproducts+mntsweetproducts+mntgoldprods+mntfruits) as spend
FROM
postgres.public.purchase_data
Similarly, to load the data from another source lakehouse
the sql looks as follows:
SELECT
customer_id,
birth_year,
education,
marital_status,
income,
country
FROM
lakehouse.customer_relationship_management.customer_data
Define the table in the modelΒΆ
Create a tables
folder to store logical table definitions, with each table defined in a separate YAML file outlining its dimensions, measures, and segments. For example, to define a table for cutomer
data:
tables:
- name: customer
sql: {{ load_sql('customer') }}
description: "This table stores key details about the customers, including their personal information, income, and classification into different risk segments. It serves as the central reference point for customer-related insights."
public: true
Add dimensions and measuresΒΆ
After defining the base table, add the necessary dimensions and measures. For example, to create a table for sales data with measures and dimensions, the YAML definition could look as follows:
tables:
- name: customer
sql: {{ load_sql('customer') }}
description: "This table stores key details about the customers, including their personal information, income, and classification into different risk segments. It serves as the central reference point for customer-related insights."
public: true
joins:
- name: purchase
relationship: one_to_many
sql: "{TABLE.customer_id}= {purchase.p_customer_id}"
dimensions:
- name: customer_id
type: number
column: customer_id
description: "The unique identifier for each customer."
primary_key: true
public: true
#...
measures:
- name: total_customers
sql: "COUNT( Distinct {customer_id})"
type: number
description: "The total number of customers in the dataset, used as a basic measure of customer volume."
Add segments to filterΒΆ
Segments are filters that allow for the application of specific conditions to refine the data analysis. By defining segments, you can focus on particular subsets of data, ensuring that only the relevant records are included in your analysis. For example, to filter for records where the state is either Illinois or Ohio, you can define a segment as follows:
To know more about segments click here.Create the viewsΒΆ
Create a views
folder to store all logical views, with each view defined in a separate YAML file (e.g., sample_view.yml
). Each view references dimensions, measures, and segments from multiple logical tables. For instance the followingtotal_spend
view is created.
views:
- name: total_spending
description: This metric measures how many marketing campaigns a customer has engaged with.
public: true
meta:
title: Customer Spending by Product Category
tags:
- DPDomain.Marketing
- DPUsecase.Customer Segmentation
- DPUsecase.Product Recommendation
- DPTier.Consumer Aligned
metric:
expression: "*/45 * * * *"
timezone: "UTC"
window: "day"
excludes:
- spend
tables:
- join_path: purchase
prefix: true
includes:
- purchase_date
- customer_id
- total_spend
- spend
- join_path: product
prefix: true
includes:
- product_category
- product_name
To know more about the views click here.
Create user groupsΒΆ
This YAML manifest file is used to manage access levels for the semantic model. It defines user groups that organize users based on their access privileges. In this file, you can create multiple groups and assign different users to each group, allowing you to control access to the model.By default, there is a 'default' user group in the YAML file that includes all users.
Multiple user groups can be created in the user_groups.yml
. To know more about the user groups click here.
Create an Instance Secret manifest fileΒΆ
# RESOURCE META SECTION
name: bitbucket-cred # Secret Resource name (mandatory)
version: v1 # Secret manifest version (mandatory)
type: instance-secret # Type of Resource (mandatory)
description: demo-secret read secrets for code repository # Secret Resource description (optional)
layer: user # DataOS Layer (optional)
# INSTANCE SECRET-SPECIFIC SECTION
instance-secret:
type: key-value # Type of Instance-secret (mandatory)
acl: r # Access control list (mandatory)
data: # Data (mandatory)
GITSYNC_USERNAME: <code_repository_username>
GITSYNC_PASSWORD: <code_repository_password>
Create a Lens deployment manifest fileΒΆ
After preparing the Lens semantic model create a lens_deployemnt.yml
parallel to the model
folder.
version: v1alpha
name: "minervalens"
layer: user
type: lens
tags:
- lens
description: minerva deployment on lens2
lens:
compute: runnable-default
secrets:
- name: bitbucket-cred #repo-cred
allKeys: true
source:
type: minerva #minerva/themis/depot
name: minervacluster #name of minerva cluster
catalog: lakehouse
repo:
url: https://bitbucket.org/tmdc/sample
lensBaseDir: sample/lens/source/minerva/model
# secretId: lens2_bitbucket_r
syncFlags:
- --ref=lens
The manifest file provided is designed for a Cluster named minervacluster
, created on the Minerva
source, with a data catalog named lakehouse
. To utilize this manifest, duplicate the file and update the source details as needed.
Each section of the YAML template outlines essential elements of the Lens deployment. Below is a detailed breakdown of its components:
-
Defining the source:
-
type
: Thetype
attribute in thesource
section must be explicitly set tominerva
. -
name
: Thename
attribute in thesource
section should specify the name of the Minerva Cluster. For example, if the name of your Minerva Cluster isminervacluster
the Source name would beminervacluster
. -
catalog
: Thecatalog
attribute must define the specific catalog name within the Minerva Cluster that you intend to use. For instance, if the catalog is namedlakehouse
, ensure this is accurately reflected in the catalog field.
-
-
Defining repository:
-
url
Theurl
attribute in the repo section specifies the Git repository where the Lens model files are stored. For instance, if your repo name is lensTutorial then the repourl
will be https://bitbucket.org/tmdc/lensTutorial -
lensBaseDir
: ThelensBaseDir
attribute refers to the directory in the repository containing the Lens model. Example:sample/lens/source/cluster/minerva/model
. -
secretId
: ThesecretId
attribute is used to access private repositories (e.g., Bitbucket, GitHub). It specifies the secret needed to authenticate and access the repository securely. -
syncFlags
: Specifies additional flags to control repository synchronization. Example:--ref=dev
specifies that the Lens model resides in thedev
branch.
-
-
Configure API, Worker, and Metric Settings (Optional): Set up replicas, logging levels, and resource allocations for APIs, workers, routers, and other components.
The above YAML manifest is intended for a Cluster named minervacluster
, created on the minerva source, with the data catalog named lakehouse
. To use this manifest file, copy the file and update the source details accordingly.
Step 4: Apply the Lens manifest fileΒΆ
After configuring the deployment file with the necessary settings and specifications, apply the manifest using the following command: