Filter Pattern Examples¶
The scanner stack offers a range of filter patterns, including the Database Filter Pattern
, Schema Filter Pattern
, and Table Filter Pattern
for data sources such as databases and data warehouses. Likewise, in the context of messaging pipelines, you can employ the topic filter pattern
. These filters enables users to exercise control over metadata scanning.
Filter patterns¶
-
databaseFilterPattern
: Use this pattern to filter databases at the highest level. Specify the databases you want to include or exclude based on certain criteria. This filtering step will narrow down the scope of the subsequent filtering operations. -
schemaFilterPattern
: Apply this pattern to filter schemas within the selected databases from the previous step. It allows you to include or exclude specific schemas based on your requirements. This filtering further refines the scope for the final step. -
tableFilterPattern
: Finally, utilize this pattern to filter tables within the selected schemas. Specify the tables you want to include or exclude based on your criteria. This filtering step operates on the remaining schemas after applying the previous two filters.
Configure Filters in Scanner YAML¶
It's important to note that filters exclusively support regular expressions. This document will guide you through the use of different filter types by providing suitable regex patterns based on the given situation. Metadata filters can be configured in Scanner YAML under the sourceConfig
section.
sourceConfig:
config:
databaseFilterPattern:
includes:
- database1
- database2
excludes:
- database3
- database4
schemaFilterPattern:
includes:
- schema1
- schema2
excludes:
- schema3
- schema4
tableFilterPattern:
includes:
- table1
- table2
excludes:
- table3
- table4
Example Scenarios¶
Let us consider a scenario where we aim to ingest metadata from a Snowflake instance that comprises multiple databases, as shown below. These databases contain various schemas and tables.
β
ββββ SNOWFLAKE # DB Name
β
ββββ SNOWFLAKE_SAMPLE_DATA # DB Name
β
ββββ TEST_SNOWFLAKE_DB # DB Name
β
ββββ TEST_HEALTHCARE # DB Name
β
ββββ TEST_SPORTS_RETAIL # DB Name
β
ββββ TEST_DUMMY_DB # DB Name
β
ββββ RETAIL_DB # DB Name
Database Filters¶
Use databaseFilterPattern
to determine which databases to include/exclude during metadata ingestion.
Example 1
In this particular example, our objective is to ingest metadata of all databases that include the term "SNOWFLAKE" in their names. To achieve this, we would apply the filter pattern .*SNOWFLAKE.*
in the includes
property. Consequently, this filter pattern will ensure the ingestion of databases such as Β SNOWFLAKE
,Β SNOWFLAKE_SAMPLE_DATA
Β andΒ TEST_SNOWFLAKE_DB
.
Example 2
If we want to scan only databases that start with SNOWFLAKE
Β Β then the filter pattern regex applied would be ^SNOWFLAKE.*
and they will include SNOWFLAKE
,Β SNOWFLAKE_SAMPLE_DATA
.
Example 3
In order to exclusively scan the only database with name SNOWFLAKE
, the filter pattern would be ^SNOWFLAKE$
Example 4
In this example, we want to ingest metadata of all databases for which the name starts withΒ TEST
Β OR ends withΒ DB
Β , then the filter patternΒ applied would beΒ ^TEST
Β &Β DB$
Β in the includes property. The scanning process will include databases such as Β TEST_SNOWFLAKEDB
Β &Β DUMMY_DB
.
Schema Filters¶
Schema filter patterns determine which schemas to include/exclude during metadata ingestion. These are just examples of schemas that could exist in a Snowflake instance, taken for demonstration purpose. The actual schemas and table names may vary based on your specific use case and requirements.
β
ββββ SNOWFLAKE # DB Name
β β
β ββββ PUBLIC # Schema Name
β β
β ββββ TPCH_SF1 # Schema Name
β
β ββββ TPCH_SF2 # Schema Name
β β
β ββββ INFORMATION_SCHEMA # Schema Name
β
ββββ SNOWFLAKE_SAMPLE_DATA # DB Name
β β
β ββββ PUBLIC # Schema Name
β β
β ββββ INFORMATION_SCHEMA # Schema Name
β β
β ββββ TPCH_SF1 # Schema Name
β β
β ββββ TPCH_SF10 # Schema Name
β β
β ββββ TPCH_SF100 # Schema Name
Example 1
Let's consider a scenario where we want to scan the metadata from the "public" schema present in all databases. The filter pattern would be public
. This will include the schema public
present in databases SNOWFLAKE
Β SNOWFLAKE_SAMPLE_DATA
.
Example 2
We wish to exclude the schema TPCH_SF100
from metadata scanning. As this schema is present only in one database, you can use excludes
property with the pattern ^TPCH_SF100$
.
Example 3
Suppose we intend to include all schemas that begin with the prefix "TPCH" in all databases. In this case, the appropriate regular expression (regex) to achieve this would be ^TPCH.*
. This regex pattern will result in the metadata scan of schemas such as TPCH_SF1
and TPCH_SF2
from the SNOWFLAKE
database, as well as schemas like TPCH_SF1
, TPCH_SF10
, and TPCH_SF100
from the SNOWFLAKE_SAMPLE_DATA
database.
Example 4
We want to include only the schema TPCH_SF1 present in all the databases but not TPCH_SF100 or TPCH_SF1000.
Table Filter pattern¶
Use tableFilterPattern
to determine which tables to include/exclude during metadata ingestion.
β
ββββ TEST_SPORTS_RETAIL # DB Name
β β
β ββββ PUBLIC # Schema Name
β β β
β β ββββ CUSTOMER # Table Name
β β β
β β ββββ CUSTOMER_ADDRESS # Table Name
β β β
β β ββββ CUSTOMER_DEMOGRAPHICS # Table Name
β β β
β β ββββ CALL_CENTER # Table Name
β β
β ββββ INFORMATION # Schema Name
β β
β ββββ ORDERS # Table Name
β β
β ββββ REGION # Table Name
β β
β ββββ CUSTOMER # Table Name
β
ββββ RETAIL_DB # DB Name
β
ββββ PUBLIC # Schema Name
β
ββββ CUSTOMER_ONLINE # Table Name
β
ββββ CUSTOMER_OFFLINE # Table Name
β
ββββ LOYAL_CUSTOMER # Table Name
Example 1
In this example, our objective is to scan only the table with the name CUSTOMER
from all the schemas of TEST_SPORTS_RETAIL
database. To achieve this, we can use the filter pattern ^CUSTOMER$
. By applying this pattern during the metadata scanning process, the tables named CUSTOMER
from the TEST_SPORTS_RETAIL.PUBLIC
and TEST_SPORTS_RETAIL.INFORMATION
schemas will be included.
sourceConfig:
config:
...
databaseFilterPattern:
includes:
- TEST_SPORTS_RETAIL
tableFilterPattern:
includes:
- ^CUSTOMER$
Example 2
In this example scenario, we want to scan all the tables having CUSTOMER in their name. We will use the filter pattern .CUSTOMER.. This will result in scanning of all the tables such as CUSTOMER
, CUSTOMER_ONLINE
, CUSTOMER_OFFLINE
, LOYAL_CUSTOMER
, etc.