Skip to content

Flare Jobs on Delta Table

To execute Flare Jobs on Delta tables using object storage depots, such as Amazon S3, Azure ABFSS, or Google Cloud Storage, a depot must first be created. The format parameter in the Depot configuration file must be set to DELTA, as demonstrated in the following example:

Sample Depot configuration manifest
abfssdelta.yml
version: v1
name: "abfssdelta"
type: depot
tags:
    - Iceberg
    - Sanity
layer: user
depot:
type: ABFSS
description: "ABFSS Iceberg depot for sanity"
compute: runnable-default
spec:
    "account": mockdataos
    "container": dropzone001
    "relativePath": /sanity
    "format": "DELTA"
    "endpointSuffix": dfs.core.windows.net

Flare supports reading from and writing to Delta tables. To enable this functionality, the value delta must be assigned to the format parameter in the inputs section for reading operations, and in the outputs section for writing operations, within the stackSpec. Examples are provided below:

Read Configuration

Following is the config file to read data from Delta tables using Flare:

version: v1
name: abfss-delta-read
type: workflow
tags:
  - Connect
  - City
description: The job ingests city data from dropzone into raw zone
workflow:
  title: Connect City csv
  dag:
    - name: abfss-delta-read-01
      title: City Dimension Ingester
      description: The job ingests city data from dropzone into raw zone
      spec:
        tags:
          - Connect
          - City
        stack: flare:6.0
        compute: runnable-default
        stackSpec:
          job:
            explain: true
            inputs:
              - name: city_connect
                dataset: dataos://abfssdelta:abcd/delta_abfss_wr_jun19_01?acl=rw
                format: delta
                options:
                  headers: true                
                # schemaPath: dataos://sanitys3:none/schemas/avsc/city.avsc

            logLevel: INFO
            outputs:
              - name: finalDf
                dataset: dataos://lakehouse:abcd/delta_abfss_re_jun19_01?acl=rw
                format: Iceberg
                options:
                  saveMode: overwrite
            steps:
              - sequence:
                  - name: finalDf
                    sql: SELECT * FROM city_connect limit 10

Write Configuration

Following is the config file to write data to Delta tables using Flare:

version: v1
name: abfss-delta-write
type: workflow
tags:
  - Connect
  - City
description: The job ingests city data from dropzone into raw zone

workflow:
  title: Connect City
  dag:
    - name: abfss-delta-write-01
      title: City Dimension Ingester
      description: The job ingests city data from dropzone into raw zone
      spec:
        tags:
          - Connect
          - City
        stack: flare:6.0
        compute: runnable-default
        stackSpec:
          job:
            explain: true
            inputs:
              - name: city_connect
                dataset: dataos://thirdparty01:none/city
                format: csv
                schemaPath: dataos://thirdparty01:none/schemas/avsc/city.avsc

            logLevel: INFO
            outputs:
              - name: finalDf
                dataset: dataos://abfssdelta:abcd/delta_abfss_wr_jun19_01?acl=rw
                format: delta
            steps:
              - sequence:
                  - name: finalDf
                    sql: SELECT * FROM city_connect limit 10