Skip to content

Scanner YAML Fields Reference

Syntax for Depot Scan YAML File

stack: scanner:2.0               
compute: runnable-default        
runAsUser: metis                 
stackSpec:
  depot: {{depot name/adddress}}             
  sourceConfig:
  config:
      type: DatabaseMetadata         
      databaseFilterPattern:
        includes/excludes:
          - {{regex}}
      schemaFilterPattern:
        includes/excludes:
          - {{regex}}
      tableFilterPattern:
        includes/excludes:
          - {{regex}}
      markDeletedTables: true/false
      includeViews: true/false

Syntax for Non-Depot Scan YAML File

stack: scanner:2.0               
compute: runnable-default        
runAsUser: metis
stackSpec:
  type: {{source type}}                
  source: {{source name}}              
  sourceConnection:                    
    config:
      type: {{source connection type}}
      username: {{username}}
      password: {{password}}
      account: {{account}}
  sourceConfig:                  
    config:
      type: {{metadata type}}         
      databaseFilterPattern:
        includes/excludes:
          - <regex>
      schemaFilterPattern:
        includes/excludes:
          - <regex>
      tableFilterPattern:
      includes/excludes:
          - <regex>
      markDeletedTables: true/false
      includeViews: true/false

Configuration Attributes

spec

Description: Specs of the Scanner workflow

Data Type Requirement Default Value Possible Value
mapping Mandatory

Example Usage:

spec:
  stack: scanner:2.0 

stack

Description: A Stack is a Resource that serves as a secondary extension point, enhancing the capabilities of a Workflow Resource by introducing additional programming paradigms.

Data Type Requirement Default Value Possible Value
string Mandatory None flare/toolbox/scanner/alpha

Additional Details: You also need to specify specific versions of the stack. If no version is explicitly specified, the system will automatically select the latest version as the default option
Example Usage:

stack: scanner:2.0 

compute

Description: A Compute resource provides processing power for the job.

Data Type Requirement Default Value Possible Value
string Mandatory None runnable-default or any other custom compute created by the user

Example Usage:

compute: runnable-default 

runAsUser

Description: When the "runAsUser" field is configured with the UserID of the use-case assignee, it grants the authority to perform operations on behalf of that user.

Data Type Requirement Default Value Possible Value
string Mandatory None UserID of the Use Case Assignee


Additional information: The default value here is metis. but 'Run as a Scanner user' use case should be granted to run Scanner workflow. Example Usage:

runAsUser: metis 

depot

Description: Name or address of the depot. Depot provides a reference to the source from which metadata is read/ingested.

Data Type Requirement Default Value Possible Value
string Mandatory only in depot scan workflow None icebase, redshift_depot, dataos://icebase, etc.

Additional information: The Scanner job will scan all the datasets referred by a depot. Scanner workflow will automatically create a source (with the same name as the depot name) where the scanned metadata is saved within Metastore.
Example Usage:

stackSpec:   
  depot: dataos://icebase            

type

Description: Type of the dataset to be scanned. This depends on the underlying data source.

Data Type Requirement Default Value Possible Value
string Mandatory for non-depot scan workflow None snowflake, bigquery, redshift, etc.


Example Usage:

stackSpec:
  type: snowflake

source

Description: Here you need to explicitly provide the source name where the scanned metadata is saved within Metastore.

Data Type Requirement Default Value Possible Value
string Mandatory for non-depot scan workflow None snowflake001, samplexyz, etc.

Additional information: On Metis UI, sources are listed for databases, messaging, dashboards, workflows, ML models, etc. Under the given source name, you can see the information about all the entities scanned for a data source.
Example Usage:

stackSpec:
  source: samplexyz 

sourceConnection

Description: Source connection configuration properties required to connect with the underlying data source to be scanned.

Data Type Requirement Default Value Possible Values
mapping Mandatory in non-depot scan None None

type

Description: Data source type in the sourceConnection section.

Data Type Requirement Default Value Possible Values
string Mandatory in non-depot scan None Redshift, Snowflake, Bigquery, etc.

Example Usage:

sourceConnection:
  config:
    type: Snowflake

username

Description: username to connect with the source

Data Type Requirement Default Value Possible Values
string Mandatory in non-depot scan None testuser, testuser@bi.io

Additional information: There will be more properties under the 'sourceConnection' section to be able to connect with the source such as password, hostPort, project, email, etc.

Example Usage:

sourceConnection:
  config:
    type: Snowflake   
    username: testuser
    password: ******
    warehouse: WAREHOUSE
    account: NB48718.central-india.azure

sourceConfig

Description: Source configuration properties required to control the metadata scan.

Data Type Requirement Default Value Possible Values
mapping Mandatory in non-depot scan None None

type

Description: Specify source config type; This is for type of metadata to be scanned.

Data Type Requirement Default Value Possible Values
string Mandatory in non-depot scan None DatabaseMetadata, DashboardMetadata

Additional information: There will be more properties under the 'sourceConfig' section to customize and control metadata scanning.
Example Usage:

sourceConfig:
  config:
    type: DatabaseMetadata

databaseFilterPattern

Description: To determine which databases to include/exclude during metadata ingestion.

Data Type Requirement Default Value Possible Values
mapping Mandatory None

Additional information: Applicable in case of databases/warehouses

includes OR excludes

  • includes: Add an array of regular expressions to this property in the YAML. The Scanner workflow will include any databases whose names match one or more of the provided regular expressions. All other databases will be excluded.
  • excludes: Add an array of regular expressions to this property in the YAML. The Scanner workflow will exclude any databases whose names match one or more of the provided regular expressions. All other databases will be included.
Data Type Requirement Default Value Possible Values
string Optional None Exact values (e.g., 'employee'), regular expressions (e.g., '^sales.*')

Example Usage:

sourceConfig:
  config:
    type: DatabaseMetadata
    databaseFilterPattern:
      includes:
        - TMDCSNOWFLAKEDB

schemaFilterPattern

Description: To determine which schemas to include/exclude during metadata ingestion.

Data Type Requirement Default Value Possible Values
mapping Mandatory None

Additional information: Applicable in case of databases/warehouses

includes OR excludes

  • includes: Add an array of regular expressions to this property in the YAML. The Scanner workflow will include any schemas whose names match one or more of the provided regular expressions. All other schemas will be excluded.
  • excludes: Add an array of regular expressions to this property in the YAML. The Scanner workflow will exclude any schemas whose names match one or more of the provided regular expressions. All other schemas will be included.
Data Type Requirement Default Value Possible Values
string Optional None Exact values (e.g., 'employee'), regular expressions (e.g., '^sales.*')

Example Usage:

sourceConfig:
  config:
    schemaFilterPattern:
      excludes:
        - mysql.*
        - information_schema.*
        - ^sys.*

Additional information: Applicable in case of databases/warehouses

tableFilterPattern

Description: To determine which tables to include/exclude during metadata ingestion.

Data Type Requirement Default Value Possible Values
mapping Mandatory None Exact values (e.g., 'employee'), regular expressions (e.g., '^sales.*')

Additional information: Applicable in case of databases/warehouses

includes OR excludes

  • includes: Add an array of regular expressions to this property in the YAML to include any tables whose names match one or more of the provided regular expressions. All other tables will be excluded.
  • excludes: Add an array of regular expressions to this property in the YAML. The Scanner workflow will exclude any tables whose names match one or more of the provided regular expressions. All other tables will be included.
Data Type Requirement Default Value Possible Values
string Optional None Exact values (e.g., 'employee'), regular expressions (e.g., '^sales.*')

Example Usage:

sourceConfig:
  config:
    tableFilterPattern:
      includes:
        - ^cust.*
Additional information: Applicable in case of databases/warehouses.

topicFilterPattern

Description: To determine which topics to include/exclude during metadata ingestion.

Data Type Requirement Default Value Possible Values
mapping Mandatory none

Additional information: Applicable in case of stream data.

includes OR excludes

  • includes: Add an array of regular expressions to this property in the YAML to include any topics whose names match one or more of the provided regular expressions. All other topics will be excluded.
  • excludes: Add an array of regular expressions to this property in the YAML to exclude any topics whose names match one or more of the provided regular expressions. All other topics will be included.
Data Type Requirement Default Value Possible Values
string Optional None Exact values (e.g., 'employee'), regular expressions (e.g., '^sales.*')

Example Usage:

sourceConfig:
  config:
    topicFilterPattern:
      includes:
        - ^topic00.*

Filter patterns support Regex in includes and excludes expressions. Refer to Filter Pattern Examples page for the example scenarios.

markDeletedTables

Description: Set the Mark Deleted Tables property to true to flag tables as soft-deleted if they are not present anymore in the source system.

Data Type Requirement Default Value Possible Values
boolean Optional false true, false

Additional information: If a dataset is deleted from the source and hasn't been ingested in Metis during a previous scanner run, there will be no visible change in the scanned metadata on the Metis UI. However, if the deleted dataset has already been ingested in MetisDB from previous scanner runs, users can run a scanner workflow for the specific depot they want to scan with the markDeletedTables: true option in the workflow configuration. After a successful run, users can check the Metis UI to see the tables that have been marked as deleted.
Example Usage:

sourceConfig:
  config:
    markDeletedTables: false

markDeletedTablesfromFilterOnly

Description: Set the Mark Deleted Tables property to true to flag tables as soft-deleted if they are not present anymore in the source system.

Data Type Requirement Default Value Possible Values
boolean Optional false true, false

Additional information: Set this property to true to flag tables as soft-deleted if they are not present anymore within the filtered schema or database only. This flag is useful when you have more than one ingestion pipelines.
Example Usage:

sourceConfig:
  config:
    markDeletedTablesfromFilterOnly: false

ingestSampleData

Description: Set this property to true to ingest sample data from the topics.

Data Type Requirement Default Value Possible Values
boolean Optional false true, false

Additional information: Set this property to true to flag tables as soft-deleted if they are not present anymore within the filtered schema or database only. This flag is useful when you have more than one ingestion pipelines.
Example Usage:

sourceConfig:
  config:
    ingestSampleData: false

markDeletedTopics

Description: Set this property to true to flag topics as soft-deleted if they are not present anymore in the source system.

Data Type Requirement Default Value Possible Values
boolean Optional false true, false

Additional information: Set this property to true to flag tables as soft-deleted if they are not present anymore within the filtered schema or database only. This flag is useful when you have more than one ingestion pipelines.
Example Usage:

sourceConfig:
  config:
    markDeletedTables: false

includeViews

Description: Set this property to include views for metadata scanning.

Data Type Requirement Default Value Possible Values
boolean Optional false true, false

Additional information: Set this property to true to flag tables as soft-deleted if they are not present anymore within the filtered schema or database only. This flag is useful when you have more than one ingestion pipelines.
Example Usage:

sourceConfig:
  config:
    includeViews: true

enableDebugLog

Description: To set the default log level to debug.

Data Type Requirement Default Value Possible Values
boolean Optional false true, false

Example Usage:

sourceConfig:
  config:
    enableDebugLog: true