Azure Synapse¶
Azure Synapse Analytics is a cloud-based analytics service that integrates data warehousing and big data analytics.
Nilus supports Synapse as a batch ingestion source and sink, allowing seamless data movement between Serverless SQL Pools or Dedicated SQL Pools and the DataOS Lakehouse or other supported destinations.
Prerequisites¶
The following are the requirements for enabling Batch Data Movement in Azure Synapse:
Authentication Method¶
Nilus supports the following authentication methods:
- SQL Authentication (username/password)
- Azure AD Access Token
- Managed Identity (for integrated authentication)
Required Parameters¶
The following are the required parameters:
username
: SQL or Azure AD userpassword
: Password (for SQL authentication)server
: Synapse workspace hostnamedatabase
: Target databasedriver
: ODBC driver (default: ODBC Driver 18 for SQL Server)
Required Permissions¶
The following are the essential permissions:
- Database-level CONNECT and USAGE
- Schema-level SELECT
- Table-level SELECT
Pre-created Azure Synapse Depot¶
A Depot must exist in DataOS with read-write access. To check the Depot, go to the Metis UI of the DataOS or use the following command:
dataos-ctl resource get -t depot -a
#Expected Output
NFO[0000] 🔍 get...
INFO[0000] 🔍 get...complete
| NAME | VERSION | TYPE | STATUS | OWNER |
| ------------ | ------- | ----- | ------ | -------- |
| synapsedepot | v2alpha | depot | active | usertest |
If the Depot is not created, use the following manifest configuration template to create the MS SQL Server Depot:
Azure Synapse Depot Manifest
name: ${{depot-name}}
version: v2alpha
type: depot
description: ${{description}}
tags:
- ${{tag1}}
- ${{tag2}}
owner: ${{owner-name}}
layer: user
depot:
type: JDBC
external: ${{true}}
compute: ${{runnable-default}}
jdbc:
subprotocol: sqlserver
host: synapse-test-batch.sql.azuresynapse.net
port: ${port}
database: ${{database}}
secrets:
- name: ${{instance-secret-name}}-r
allkeys: true
- name: ${{instance-secret-name}}-rw
allkeys: true
Info
- Question: What is the server name for synapse-test-batch?\
Answer: The server name is defined by Starbust as
yourserver-ondemand.sql.azuresynapse.net
- Update variables such as
name
,owner
,compute
,layer
,host
,port
,database
etc., and contact the DataOS Administrator or Operator to obtain the appropriate secret name.
Nilus can also be directly connected to Synapse using a connection URI, enabling seamless integration with enterprise identity management systems. For instance:
OR
Sample Workflow Config¶
name: nb-synapse-test-01
version: v1
type: workflow
tags:
- workflow
- nilus-batch
description: Nilus Batch Workflow Sample for Azure Synapse to DataOS Lakehouse
workspace: research
workflow:
dag:
- name: nb-job-01
spec:
stack: nilus:1.0
compute: runnable-default
logLevel: INFO
resources:
requests:
cpu: 500m
memory: 512Mi
stackSpec:
source:
address: dataos://synapse_depot
options:
source-table: sales.orders
incremental-key: modified_date
sink:
address: dataos://testawslh
options:
dest-table: synapse_retail.batch_orders
incremental-strategy: append
Info
Ensure that all placeholder values and required fields (e.g., connection addresses, slot names, and access credentials) are properly updated before applying the configuration to a DataOS workspace.
Deploy the manifest file using the following command:
Supported Attribute Details¶
Nilus supports the following source options for Azure Synapse:
Option | Required | Description |
---|---|---|
source-table |
Yes | Table name (schema.table ) |
incremental-key |
No | Column used for incremental ingestion |
Core Concepts
- Serverless SQL Pool
- Pay-per-query pricing model.
- Direct query execution over data in Azure Data Lake.
- Best suited for ad-hoc analytics and cost-sensitive workloads.
- Dedicated SQL Pool
- Pre-provisioned, scalable compute resources.
- Optimized for high-performance workloads.
- Supports distribution strategies, columnstore indexes, and workload isolation.
- Incremental Loading
- Supported using timestamp or sequential ID columns.
- Common key:
modified_date
orlast_updated_at
.