Google Cloud Storage (GCS)¶

Read Config¶

Input Section Configuration for Reading from GCS Data Source

inputs:
  - name: city_connect
    inputType: file
    file:
      warehousePath: 'gs://<bucket-name>/<file-path>' #e.g. gs://sample/data
      schemaName: gcs01
      tableName: gcsTable
      format: iceberg

Sample YAML for Reading from GCS Data Source

version: v1
name: standalone-read-gcs
type: workflow
tags:
  - standalone
  - readJob
  - gcs
description: The job ingests city data from gcs to iceberg
workflow:
  title: Connect City
  dag:
    - name: city-gcs-write-01
      title: Sample Transaction Data Ingester
      description: The job ingests city data from gcs to iceberg
      spec:
        tags:
          - standalone
          - readJob
          - gcs
        stack: flare:3.0
        compute: runnable-default
        flare:
          job:
            explain: true
                        loglevel: INFO
            inputs: # Read from Google Storage
              - name: city_connect
                inputType: file
                file:
                  warehousePath: 'gs://<bucket-name>/<file-path>' #e.g. gs://tmdc-dataos/sampledata
                  schemaName: gcs01
                  tableName: gcsTable
                  format: iceberg

            outputs: # Write to Local System
              - name: finalDf
                outputType: file
                file:
                  format: iceberg
                  warehousePath: /data/examples/dataout/gcsdata/
                  schemaName: default
                  tableName: trans_oms_data3
                  options:
                    saveMode: append

            steps:
              - sequence:
                  - name: finalDf
                    sql: SELECT * FROM city_connect

          sparkConf:
            - 'spark.hadoop.google.cloud.auth.service.account.json.keyfile': 'gcp-demo-sa.json'
                    # Keep the JSON key file at the base directory where you have kept the configuration file & sample data.

Write Config¶

Output Section Configuration for Writing to GCS Data Source

outputs:
  - name: city_connect
    outputType: file
    file:
      warehousePath: 'gs://<bucket-name>/<file-path>' #e.g. gs://sample/data
      schemaName: gcs01
      tableName: gcsTable
      format: iceberg

Sample YAML for Writing to GCS Data Source

version: v1
name: standalone-write-gcs
type: workflow
tags:
  - standalone
  - writeJob
  - gcs
description: The job ingests city data from file source to gcs
workflow:
  title: Connect City
  dag:
    - name: standalone-gcs-write
      title: Sample Transaction Data Ingester
      description: The job ingests city data from file to gcs
      spec:
        tags:
          - standalone
          - writeJob
          - gcs
        stack: flare:3.0
        compute: runnable-default
        flare:
          job:
            explain: true
            inputs: # Read from Local System
              - name: city_connect
                inputType: file
                file:
                  path: /transactions/oms_transactions.json
                  format: json
                  isStream: false
            outputs: # Write to Google Storage
              - name: city_connect
                outputType: file
                file:
                  warehousePath: 'gs://<bucket-name>/<file-path>' 
                  schemaName: gcs01
                  tableName: gcsTable
                  format: iceberg
            steps:
              - sequence:
                  - name: finalDf
                    sql: SELECT * FROM city_connect
          sparkConf:
            - 'spark.hadoop.google.cloud.auth.service.account.json.keyfile': 'gcp-demo-sa.json'
                    # Keep the JSON key file at the base directory where you have kept the configuration file & sample data.