Skip to content

Supported Data Sources in Flash

This section provides guidance on configuring Flash Service for various types of Depots supported by Flash, including Iceberg, Redshift, Snowflake, and BigQuery formats.

Iceberg format Depots

To configure Flash Service for Iceberg format Depots (e.g., DataOS Lakehouse, AWS S3, Azure ABFSS, Azure WASBS), add the following configuration in the Stack-specific section of the Flash Service manifest file. Replace the placeholders with actual values, customize the query, and adjust scheduling as needed.

# Iceberg
datasets:
  - address: dataos://s3iceberg:iceberg_sink_s3/ice_groot
    name: customer

init:
  - create or replace table mycustomer as (select * from customer)

schedule:
  - expression: "*/2 * * * *"
    sql: INSERT INTO mycustomer BY NAME (select * from customer);

You can retrieve the dataset address from Metis.

Redshift Depot

For configuring Flash Service to work with Redshift Depots, use the following code. Ensure to replace the placeholders, modify the query as needed, and set the appropriate schedule.

# Redshift Example
datasets:
  - name: f_sales
    depot: dataos://redshiftdepot
    sql: SELECT * FROM f_sales
    refresh:
      expression: "*/2 * * * *"
      sql: SELECT MAX(invoice_dt_sk) FROM f_sales
      where: invoice_dt_sk > CURRENT_SQL_RUN_VALUE

Snowflake Depot

To configure Flash Service for Snowflake Depots, use the following configuration. Modify the placeholders and queries as necessary, and adjust the refresh schedule accordingly.

# Snowflake Example
datasets:
  - name: f_sales_sf
    depot: dataos://stsnowflake
    sql: SELECT * FROM public.f_sales
    meta:
      schema: public
    refresh:
      expression: "*/2 * * * *"
      sql: SELECT MAX(invoice_dt_sk) FROM public.f_sales
      where: invoice_dt_sk > PREVIOUS_SQL_RUN_VALUE

BigQuery Depot

For BigQuery Depots, configure the Flash Service using the following code. Update the dataset details, SQL query, and scheduling to fit your requirements.

# BigQuery Example
datasets:
  - name: f_sales
    depot: dataos://bigquery
    sql: SELECT * FROM sales_360.f_sales
    meta:
      bucket: tmdcdemogcs
    refresh:
      expression: "*/2 * * * *"
      sql: SELECT MAX(invoice_dt_sk) FROM sales_360.f_sales
      where: invoice_dt_sk > PREVIOUS_SQL_RUN_VALUE

These configurations will allow Flash to cache datasets from various sources, ensuring efficient data access and query performance.