Amazon Redshift¶

Read Config¶

Input Section Configuration for Reading from Redshift Data Source

inputs:
  - name: oms_transactions_data
    inputType: redshift
    redshift:
      jdbcUrl: jdbc:redshift://<red-shift-host-address>/<database>
      tempDir: s3a://<bucket>/<dir>
      username: <username>
      password: <password>
      dbTable: <table-name>

Sample YAML for Reading from Redshift Data Source

version: v1
name: standalone-red-shift-write
type: workflow
tags:
  - standalone
  - readJob
  - redshift
title: Read from red-shift using standalone mode
description: |
  The purpose of this workflow is to read from Redshift and write to Local.
workflow:
  dag:
    - name: write-redshift-local
      title: Read from red-shift using standalone mode
      description: |
        The purpose of this job is to read from Redshift and write to Local
      spec:
        tags:
          - standalone
          - readJob
          - red-shift
        stack: flare:3.0

        envs: # Environment Variables 
          DISABLE_HADOOP_PATH_CHECKS: true
                        # Without these environment variables, the job will fail to establish 
                      # a connection with the Redshift in standalone mode.
        compute: runnable-default
        flare:
          job:
            explain: true
            logLevel: INFO

            inputs: # Read from Redshift
              - name: oms_transactions_data
                inputType: redshift
                redshift:
                  jdbcUrl: jdbc:redshift://<redshift-host-address>/<database>
                  tempDir: s3a://<bucket>/<dir>
                  username: <username>
                  password: <password>
                  dbTable: <table>

            outputs: # Write to Local
              - name: finalDf
                outputType: file
                file:
                  format: iceberg
                  warehousePath: /data/examples/dataout/redshift
                  schemaName: default
                  tableName: trans_oms_data3
                  options:
                    saveMode: append

            steps:
              - sequence:
                  - name: finalDf
                    sql: SELECT * FROM oms_transactions_data LIMIT 10

          sparkConf:
            - 'spark.hadoop.fs.s3a.bucket.<bucket-name>.access.key': '<access-key>'
            - 'spark.hadoop.fs.s3a.bucket.<bucket-name>.secret.key': '<secret-key>'

Write Config¶

Output Section Configuration for Writing to Redshift Data Source

outputs:
  - name: finalDf
    outputType: redshift
    redshift:
      jdbcUrl: jdbc:redshift://<redshift-host-address>/<database>
      tempDir: s3a://<bucket>/<dir>
      username: <username>
      password: <password>
      dbTable: <table>

Sample YAML for Writing to Redshift Data Source

version: v1
name: standalone-write-red-shift
type: workflow
tags:
  - standalone
  - writeJob
  - redshift
title: Write to red-shift in standalone mode
description: |
  The purpose of this workflow is to read from local and write to Redshift
workflow:
  dag:
    - name: standalone-redshift-write
      title: Write to red-shift using standalone mode
      description: |
        The purpose of this job is to read from local and write to Redshift
      spec:
        tags:
          - standalone
          - writeJob
          - redshift
        stack: flare:3.0

        envs: # Environment Variables
          DISABLE_HADOOP_PATH_CHECKS: true
                    # Without these environment variables, the job will fail to establish 
                  # a connection with the Redshift in standalone mode.

        compute: runnable-default
        flare:
          job:
            explain: true
            logLevel: INFO

            inputs: # Read from Local
              - name: oms_transactions_data
                inputType: file
                file:
                  path: /data/examples/default/city
                  format: csv

            outputs: # Write to Redshift
              - name: finalDf
                outputType: redshift
                redshift:
                  jdbcUrl: jdbc:redshift://<redshfit-host-address>/<database>
                  tempDir: s3a://<bucket>/<dir>
                  username: <username>
                  password: <password>
                  dbTable: <table-name>

            steps:
              - sequence:
                  - name: finalDf
                    sql: SELECT * FROM oms_transactions_data LIMIT 10

          sparkConf:
            - 'spark.hadoop.fs.s3a.bucket.<bucket-name>.access.key': '<access-key>'
            - 'spark.hadoop.fs.s3a.bucket.<bucket-name>.secret.key': '<secret-key>'