Skip to content

Redshift

Amazon Redshift is a fully managed, petabyte-scale data warehouse service in AWS that allows fast query performance and scalability. Nilus supports Redshift as both a batch ingestion source and a destination, enabling seamless movement of data between Redshift and the DataOS Lakehouse or other supported sinks.

Nilus connects to Redshift either via a direct connection URI or through DataOS Depot, which centralizes authentication, credentials, and connection details.

Info

  • Contact the DataOS Administrator or Operator to obtain configured URI/Depot UDL and other required parameters.

Prerequisites

Following are the requirements for enabling Batch Data Movement in Redshift:

Database User Permissions

The Redshift user must be granted the following privileges:

-- Database-level access
GRANT USAGE ON DATABASE <database_name> TO <username>;

-- Schema-level access
GRANT USAGE ON SCHEMA <schema_name> TO <username>;

-- Table-level access
GRANT SELECT ON ALL TABLES IN SCHEMA <schema_name> TO <username>;

-- Additional
GRANT SELECT ON <schema_name>.<table_name> TO <username>;

Pre-created Redshift Depot

A Depot must exist in DataOS with read-write access. To check the Depot, go to the Metis UI of the DataOS or use the following command:

dataos-ctl resource get -t depot -a

#Expected Output
NFO[0000] 🔍 get...                                     
INFO[0000] 🔍 get...complete 
| NAME          | VERSION | TYPE  | STATUS | OWNER    |
| ------------- | ------- | ----- | ------ | -------- |
| redshiftdepot | v2alpha | depot | active | usertest |

If the Depot is not created use the following manifest configuration template to create the Redshift Depot:

Redshift Depot Manifest
name: ${{redshift-depot-name}}
version: v2alpha
type: depot
tags:
    - ${{redshift}}
description: ${{Redshift Sample data}}
owner: ${{owner-name}}
layer: user
depot:
  type: REDSHIFT
  external: ${{true}}
  secrets:
    - name: ${{redshift-instance-secret-name}}-r
      allkeys: true
    - name: ${{redshift-instance-secret-name}}-rw
      allkeys: true
  redshift:
    host: ${{hostname}}
    subprotocol: ${{subprotocol}}
    port: ${{5439}}
    database: ${{sample-database}}
    bucket: ${{tmdc-dataos}}
    relativePath: ${{development/redshift/data_02/}}

Info

Update variables such as name, owner, and layer, and contact the DataOS Administrator or Operator to obtain the appropriate secret name.

Sample Workflow Config

name: nb-redshift-test-01
version: v1
type: workflow
tags:
    - workflow
    - nilus-batch
description: Nilus Batch Workflow Sample for AWS Redshift to DataOS Lakehouse
workspace: research
workflow:
  dag:
    - name: nb-job-01
      spec:
        stack: nilus:1.0
        compute: runnable-default
        logLevel: INFO
        resources:
          requests:
            cpu: 500m
            memory: 512Mi
        stackSpec:
          source:
            address: dataos://redshift_depot
            options:
              source-table: analytics.orders
              incremental-key: updated_at
          sink:
            address: dataos://testinglh
            options:
              dest-table: redshift_retail.batch_orders
              incremental-strategy: append
              aws_region: us-west-2

Info

Ensure that all placeholder values and required fields (e.g., connection addresses, slot names, and access credentials) are properly updated before applying the configuration to a DataOS workspace.

Deploy the manifest file using the following command:

dataos-ctl resource apply -f ${{path to the Nilus Workflow YAML}}

Supported Attributes Details

Nilus supports the following source options for Redshift:

Options Required Description
source-table Yes Table name (schema.table) or query prefixed with query:
incremental-key No Column name for incremental ingestion

Info

Incremental Loading

  • Supported using timestamp or sequential ID columns (e.g., updated_at).
  • The default incremental key for usage tracking is starttime.