Scanner YAML Fields Reference¶
Syntax for Depot Scan YAML File¶
stack: scanner:2.0
compute: runnable-default
runAsUser: metis
stackSpec:
depot: {{depot name/adddress}}
sourceConfig:
config:
type: DatabaseMetadata
databaseFilterPattern:
includes/excludes:
- {{regex}}
schemaFilterPattern:
includes/excludes:
- {{regex}}
tableFilterPattern:
includes/excludes:
- {{regex}}
markDeletedTables: true/false
includeViews: true/false
Syntax for Non-Depot Scan YAML File¶
stack: scanner:2.0
compute: runnable-default
runAsUser: metis
stackSpec:
type: {{source type}}
source: {{source name}}
sourceConnection:
config:
type: {{source connection type}}
username: {{username}}
password: {{password}}
account: {{account}}
sourceConfig:
config:
type: {{metadata type}}
databaseFilterPattern:
includes/excludes:
- <regex>
schemaFilterPattern:
includes/excludes:
- <regex>
tableFilterPattern:
includes/excludes:
- <regex>
markDeletedTables: true/false
includeViews: true/false
Configuration Attributes¶
spec
¶
Description: Specs of the Scanner workflow
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
mapping | Mandatory |
Example Usage:
stack
¶
Description: A Stack is a Resource that serves as a secondary extension point, enhancing the capabilities of a Workflow Resource by introducing additional programming paradigms.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | Mandatory | None | flare/toolbox/scanner/alpha |
Additional Details: You also need to specify specific versions of the stack. If no version is explicitly specified, the system will automatically select the latest version as the default option
Example Usage:
compute
¶
Description: A Compute resource provides processing power for the job.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | Mandatory | None | runnable-default or any other custom compute created by the user |
Example Usage:
runAsUser
¶
Description: When the "runAsUser" field is configured with the UserID of the use-case assignee, it grants the authority to perform operations on behalf of that user.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | Mandatory | None | UserID of the Use Case Assignee |
Additional information: The default value here is metis
. but 'Run as a Scanner user' use case should be granted to run Scanner workflow.
Example Usage:
depot
¶
Description: Name or address of the depot. Depot provides a reference to the source from which metadata is read/ingested.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | Mandatory only in depot scan workflow | None | icebase, redshift_depot, dataos://icebase, etc. |
Additional information: The Scanner job will scan all the datasets referred by a depot. Scanner workflow will automatically create a source (with the same name as the depot name) where the scanned metadata is saved within Metastore.
Example Usage:
type
¶
Description: Type of the dataset to be scanned. This depends on the underlying data source.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | Mandatory for non-depot scan workflow | None | snowflake, bigquery, redshift, etc. |
Example Usage:
source
¶
Description: Here you need to explicitly provide the source name where the scanned metadata is saved within Metastore.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | Mandatory for non-depot scan workflow | None | snowflake001, samplexyz, etc. |
Additional information: On Metis UI, sources are listed for databases, messaging, dashboards, workflows, ML models, etc. Under the given source name, you can see the information about all the entities scanned for a data source.
Example Usage:
sourceConnection
¶
Description: Source connection configuration properties required to connect with the underlying data source to be scanned.
Data Type | Requirement | Default Value | Possible Values |
---|---|---|---|
mapping | Mandatory in non-depot scan | None | None |
type
¶
Description: Data source type in the sourceConnection section.
Data Type | Requirement | Default Value | Possible Values |
---|---|---|---|
string | Mandatory in non-depot scan | None | Redshift, Snowflake, Bigquery, etc. |
Example Usage:
username
¶
Description: username to connect with the source
Data Type | Requirement | Default Value | Possible Values |
---|---|---|---|
string | Mandatory in non-depot scan | None | testuser, testuser@bi.io |
Additional information: There will be more properties under the 'sourceConnection' section to be able to connect with the source such as password, hostPort, project, email, etc.
Example Usage:
sourceConnection:
config:
type: Snowflake
username: testuser
password: ******
warehouse: WAREHOUSE
account: NB48718.central-india.azure
sourceConfig
¶
Description: Source configuration properties required to control the metadata scan.
Data Type | Requirement | Default Value | Possible Values |
---|---|---|---|
mapping | Mandatory in non-depot scan | None | None |
type
¶
Description: Specify source config type; This is for type of metadata to be scanned.
Data Type | Requirement | Default Value | Possible Values |
---|---|---|---|
string | Mandatory in non-depot scan | None | DatabaseMetadata, DashboardMetadata |
Additional information: There will be more properties under the 'sourceConfig' section to customize and control metadata scanning.
Example Usage:
databaseFilterPattern
¶
Description: To determine which databases to include/exclude during metadata ingestion.
Data Type | Requirement | Default Value | Possible Values |
---|---|---|---|
mapping | Mandatory | None |
Additional information: Applicable in case of databases/warehouses
includes OR excludes
¶
includes:
Add an array of regular expressions to this property in the YAML. The Scanner workflow will include any databases whose names match one or more of the provided regular expressions. All other databases will be excluded.excludes
: Add an array of regular expressions to this property in the YAML. The Scanner workflow will exclude any databases whose names match one or more of the provided regular expressions. All other databases will be included.
Data Type | Requirement | Default Value | Possible Values |
---|---|---|---|
string | Optional | None | Exact values (e.g., 'employee'), regular expressions (e.g., '^sales.*') |
Example Usage:
schemaFilterPattern
¶
Description: To determine which schemas to include/exclude during metadata ingestion.
Data Type | Requirement | Default Value | Possible Values |
---|---|---|---|
mapping | Mandatory | None |
Additional information: Applicable in case of databases/warehouses
includes OR excludes
¶
includes:
Add an array of regular expressions to this property in the YAML. The Scanner workflow will include any schemas whose names match one or more of the provided regular expressions. All other schemas will be excluded.excludes
: Add an array of regular expressions to this property in the YAML. The Scanner workflow will exclude any schemas whose names match one or more of the provided regular expressions. All other schemas will be included.
Data Type | Requirement | Default Value | Possible Values |
---|---|---|---|
string | Optional | None | Exact values (e.g., 'employee'), regular expressions (e.g., '^sales.*') |
Example Usage:
Additional information: Applicable in case of databases/warehouses
tableFilterPattern
¶
Description: To determine which tables to include/exclude during metadata ingestion.
Data Type | Requirement | Default Value | Possible Values |
---|---|---|---|
mapping | Mandatory | None | Exact values (e.g., 'employee'), regular expressions (e.g., '^sales.*') |
Additional information: Applicable in case of databases/warehouses
includes OR excludes
¶
includes:
Add an array of regular expressions to this property in the YAML to include any tables whose names match one or more of the provided regular expressions. All other tables will be excluded.excludes
: Add an array of regular expressions to this property in the YAML. The Scanner workflow will exclude any tables whose names match one or more of the provided regular expressions. All other tables will be included.
Data Type | Requirement | Default Value | Possible Values |
---|---|---|---|
string | Optional | None | Exact values (e.g., 'employee'), regular expressions (e.g., '^sales.*') |
Example Usage:
Additional information: Applicable in case of databases/warehouses.topicFilterPattern
¶
Description: To determine which topics to include/exclude during metadata ingestion.
Data Type | Requirement | Default Value | Possible Values |
---|---|---|---|
mapping | Mandatory | none |
Additional information: Applicable in case of stream data.
includes OR excludes
¶
includes:
Add an array of regular expressions to this property in the YAML to include any topics whose names match one or more of the provided regular expressions. All other topics will be excluded.excludes
: Add an array of regular expressions to this property in the YAML to exclude any topics whose names match one or more of the provided regular expressions. All other topics will be included.
Data Type | Requirement | Default Value | Possible Values |
---|---|---|---|
string | Optional | None | Exact values (e.g., 'employee'), regular expressions (e.g., '^sales.*') |
Example Usage:
Filter patterns support Regex in
includes
andexcludes
expressions. Refer to Filter Pattern Examples page for the example scenarios.
markDeletedTables
¶
Description: Set the Mark Deleted Tables property to true to flag tables as soft-deleted if they are not present anymore in the source system.
Data Type | Requirement | Default Value | Possible Values |
---|---|---|---|
boolean | Optional | false | true, false |
Additional information: If a dataset is deleted from the source and hasn't been ingested in Metis during a previous scanner run, there will be no visible change in the scanned metadata on the Metis UI. However, if the deleted dataset has already been ingested in MetisDB from previous scanner runs, users can run a scanner workflow for the specific depot they want to scan with the markDeletedTables: true
option in the workflow configuration. After a successful run, users can check the Metis UI to see the tables that have been marked as deleted.
Example Usage:
markDeletedTablesfromFilterOnly
¶
Description: Set the Mark Deleted Tables property to true to flag tables as soft-deleted if they are not present anymore in the source system.
Data Type | Requirement | Default Value | Possible Values |
---|---|---|---|
boolean | Optional | false | true, false |
Additional information: Set this property to true to flag tables as soft-deleted if they are not present anymore within the filtered schema or database only. This flag is useful when you have more than one ingestion pipelines.
Example Usage:
ingestSampleData
¶
Description: Set this property to true to ingest sample data from the topics.
Data Type | Requirement | Default Value | Possible Values |
---|---|---|---|
boolean | Optional | false | true, false |
Additional information: Set this property to true to flag tables as soft-deleted if they are not present anymore within the filtered schema or database only. This flag is useful when you have more than one ingestion pipelines.
Example Usage:
markDeletedTopics
¶
Description: Set this property to true to flag topics as soft-deleted if they are not present anymore in the source system.
Data Type | Requirement | Default Value | Possible Values |
---|---|---|---|
boolean | Optional | false | true, false |
Additional information: Set this property to true to flag tables as soft-deleted if they are not present anymore within the filtered schema or database only. This flag is useful when you have more than one ingestion pipelines.
Example Usage:
includeViews
¶
Description: Set this property to include views for metadata scanning.
Data Type | Requirement | Default Value | Possible Values |
---|---|---|---|
boolean | Optional | false | true, false |
Additional information: Set this property to true to flag tables as soft-deleted if they are not present anymore within the filtered schema or database only. This flag is useful when you have more than one ingestion pipelines.
Example Usage:
enableDebugLog
¶
Description: To set the default log level to debug.
Data Type | Requirement | Default Value | Possible Values |
---|---|---|---|
boolean | Optional | false | true, false |
Example Usage: