Setting Up Nilus¶
The Nilus stack in DataOS is designed for developers and enables data movement from a source to a destination without requiring complex code. It provides built-in connectors for commonly used data sources, along with the ability to add custom sources and destinations for batch data movement.
Prerequisites¶
Deployment of the Nilus Server requires appropriate access permissions. These permissions must be granted either through predefined use cases or by assigning the operator tag. Only users with the operator role are authorized to deploy Nilus Server.
Note: Permission requirements may vary across organizations, depending on role configurations and access policies defined within each DataOS environment.
Verifying User Permissions
To verify whether the required access permissions or operator tag are present, execute the following command:
dataos-ctl user get
# Expected Output:
dataos-ctl user get
INFO[0000] 😃 user get...
INFO[0000] 😃 user get...complete
NAME │ ID │ TYPE │ EMAIL │ TAGS
───────────────┼─────────────┼────────┼──────────────────────┼─────────────────────────────────
Groot │ iamgroot │ person │ iamgroot@tmdc.io │ roles:id:data-dev,
│ │ │ │ roles:id:operator,
│ │ │ │ roles:id:system-dev,
│ │ │ │ roles:id:user,
│ │ │ │ users:id:iamgroot
In the example above, the presence of roles:id:operator indicates that the user is authorized to deploy the Nilus Server.
Error on Missing Permissions
If a deployment attempt is made without the required permissions or operator tag, the following error will be returned:
⚠️ Invalid Parameter - failure validating resource : could not get the secret map for the dataos address: 'dataos://nilusdb?acl=rw' err: 403 <nil>
WARN[0001] 🛠 apply...error
ERRO[0001] failure applying resources
Resolution
If the required tags or access permissions are not assigned, contact a DataOS Administrator or Operator to request the necessary access.
Sample Nilus Server¶
The Nilus Server is a FastAPI service responsible for capturing and managing pipeline metadata. It must be deployed and running before the Nilus Stack, as the stack depends on the server’s API endpoint. The following template defines the Nilus Server service used to capture pipeline information.
Nilus Server Manifest
name: nilus-server
version: v1
type: service
tags:
- service
- nilus-server
description: Nilus Server for capturing pipeline information
workspace: system
service:
servicePort: 8000
ingress:
enabled: true
path: /nilus/system:nilus-server
stripPath: false
noAuthentication: false
replicas: 1
logLevel: INFO
compute: runnable-default
resources:
requests:
cpu: 1000m
memory: 1000Mi
limit:
cpu: 2000m
memory: 2000Mi
stack: container
envs:
NILUS_DB_DEPOT_ADDRESS: dataos://nilusdb?acl=rw
NILUS_SERVER_PORT: 8000
stackSpec:
image: 933570616564.dkr.ecr.us-west-2.amazonaws.com/released-images/nilus:0.1.1
command:
- /bin/sh
- -c
arguments:
- |
python3 /app/nilus_server/entrypoint.py run-migration \
&& exec python3 /app/nilus_server/entrypoint.py run-server
Replace all placeholder values—with the appropriate environment-specific configurations.
Configuration Requirement
The default workspace for Nilus components is system. When deploying the nilus-server in a non-default workspace, both the ingress.path and the Nilus Stack YAML must be updated to reflect the correct workspace reference.
Expected Format:
Example:
For a workspace named analytics, the path should be:
Common Environment Variables in the Server
NILUS_DB_DEPOT_ADDRESS: Defines the DataOS depot location for Nilus database connections.NILUS_SERVER_PORT: Specifies the port on which the Nilus Server listens.
Sample Nilus Stack¶
Once the Nilus Server is active, use the following template as a reference to create a Nilus Stack. This configuration utilizes the Nilus Python entry point, thereby removing the requirement for a custom wrapper script.
Nilus Stack Manifest
name: "nilus-stack"
version: "v1alpha"
type: "stack"
layer: "user"
description: "DataOS nilus stack v1alpha version 2 with schema enforcement"
stack:
name: "nilus"
version: "1.0"
reconciler: "stackManager"
dataOsAddressJqFilters:
- .source.address
- .sink.address
secretProjection:
type: "propFile"
image:
registry: "933570616564.dkr.ecr.us-west-2.amazonaws.com"
repository: "released-images"
image: "nilus"
tag: "0.1.1"
command:
- python3
arguments:
- /app/nilus/main.py
environmentVars:
DATAOS_WORK_DIR: /etc/dataos/work
BASE_CONFIG_FILE_DIR: /etc/dataos/config
DEBEZIUM_PROPS_PATH: /etc/dataos/debezium
NILUS_SERVICE_URL: "http://nilus-server.system.svc.cluster.local:8000/nilus/system:nilus-server"
NILUS_HOSTNAME_VERIFICATION_ENABLED: "false"
stackSpecValueSchema:
jsonSchema: |
{ "$schema": "http://json-schema.org/draft-07/schema#", "type": "object", "properties": { "source": { "type": "object", "properties": { "address": { "type": "string" }, "options": { "type": "object", "properties": { "extract-parallelism": { "type": "integer" }, "incremental-key": { "type": "string" }, "interval-end": { "type": "string" }, "interval-start": { "type": "string" }, "mask": { "type": "object" }, "max-table-nesting": { "type": "string" }, "page-size": { "type": "integer" }, "primary-key": { "type": "string" }, "source-table": { "type": "string" }, "sql-exclude-columns": { "type": "string" }, "sql-limit": { "type": "integer" }, "sql-reflection-level": { "type": "string" }, "type-hints": { "type": "object" }, "yield-limit": { "type": "integer" }, "type": { "type": "string" }, "engine": { "type": "string" } }, "additionalProperties": true } }, "required": [ "address" ] }, "sink": { "type": "object", "properties": { "address": { "type": "string" }, "options": { "type": "object", "properties": { "cluster-by": { "items": { "type": "string" }, "type": "array" }, "dest-table": { "type": "string" }, "full-refresh": { "type": "boolean" }, "incremental-strategy": { "type": "string", "enum": [ "replace", "append", "merge" ] }, "loader-file-size": { "type": "integer" }, "partition-by": { "type": "array", "items": { "type": "object", "properties": { "column": { "type": "string" }, "type": { "type": "string", "enum": [ "year", "month", "day", "hour", "bucket", "identity" ] }, "name": { "type": "string" }, "index": { "type": "integer" }, "bucket-count": { "type": "integer" } }, "required": [ "column", "type", "index", "name" ] } }, "staging-bucket": { "type": "string" } }, "additionalProperties": false } }, "required": [ "address" ] }, "repo": { "type": "object", "properties": { "url": { "type": "string" }, "syncFlags": { "type": "array", "items": { "type": "string" } }, "baseDir": { "type": "string" } }, "required": [ "url", "baseDir" ] } }, "required": [ "source", "sink" ] }
workflowJobConfig:
configFileTemplate: |
config.yaml: |
{{ toYaml .ApplicationSpec.StackSpec | indent 2 }}
containerResourceTemplate: |
{{- if .ApplicationSpec.StackSpec.repo }}
{{- $repo := .ApplicationSpec.StackSpec.repo }}
initContainers:
- command:
- /git-sync
args:
- --repo={{ getNested $repo "url" }}
- --one-time=true
- --root=/etc/dataos/work
{{- if hasKey $repo "syncFlags" }}
{{- range $flag := getNested $repo "syncFlags" }}
- {{ $flag }}
{{- end }}
{{- end }}
image: 933570616564.dkr.ecr.us-west-2.amazonaws.com/released-images/git-sync:latest
imagePullPolicy: IfNotPresent
name: "{{.Name}}{{.Stamp}}-ic"
{{ if .ApplicationSpec.Resources -}}
resources:
{{toYaml .ApplicationSpec.Resources | indent 4}}
{{- end }}
volumeMounts:
- mountPath: /etc/dataos/work
name: workdir
envFrom:
- secretRef:
name: "{{.Name}}-{{.Type}}{{.Stamp}}-env"
{{ if .EnvironmentVarsFromSecret }}
{{- range $secName := .EnvironmentVarsFromSecret }}
- secretRef:
name: "{{$secName}}"
{{- end }}
{{- end }}
{{- end }}
container:
name: {{.Name}}{{.Stamp}}-main
image: "{{.Image}}"
imagePullPolicy: IfNotPresent
command:
- python
args:
- /app/nilus/main.py
envFrom:
- secretRef:
name: "{{.Name}}-{{.Type}}{{.Stamp}}-env"
{{ if .EnvironmentVarsFromSecret }}
{{- range $secName := .EnvironmentVarsFromSecret }}
- secretRef:
name: "{{$secName}}"
{{- end }}
{{- end }}
volumeMounts:
- mountPath: "{{.DataOsSecretMountPath}}"
name: dataos-secret-mount
readOnly: true
- name: workdir
mountPath: /etc/dataos/work
{{ if .HasConfigConfs }}
- name: dataos-config-mount
mountPath: "{{.DataOsConfigMountPath}}"
readOnly: true
{{- end }}
{{ if .ApplicationSpec.TempVolume -}}
- name: {{.Name}}-{{.Type}}{{.Stamp}}-tdm
mountPath: "{{.DataTempMountPath}}"
subPath: {{.Name}}{{.Stamp}}
{{- end }}
{{ if .ApplicationSpec.PersistentVolume -}}
- name: {{.Name}}-{{.Type}}{{.Stamp}}-pdm
mountPath: "{{.DataPersistentMountPath}}/{{.ApplicationSpec.PersistentVolume.Directory}}"
subPath: "{{.ApplicationSpec.PersistentVolume.Directory}}"
{{- end }}
{{ if .Volumes }}
{{- range $volume := .Volumes }}
- name: {{$volume.Name}}
mountPath: "{{$volume.MountPath}}"
readOnly: {{$volume.ReadOnly}}
{{ if $volume.SubPath }}
subPath: "{{$volume.SubPath}}"
{{- end }}
{{- end }}
{{- end }}
{{ if .ApplicationSpec.Resources -}}
resources:
{{toYaml .ApplicationSpec.Resources | indent 4}}
{{- end }}
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
{{ if .ApplicationSpec.EnvironmentVars }}
{{- range $conf, $value := .ApplicationSpec.EnvironmentVars }}
- name: {{$conf}}
value: {{ $value | quote }}
{{- end }}
{{- end}}
serviceConfig:
configFileTemplate: |
config.yaml: |
{{ toYaml .ApplicationSpec.StackSpec | indent 2 }}
containerResourceTemplate: |
{{- if hasKey .ApplicationSpec.StackSpec "repo" }}
{{- $repo := .ApplicationSpec.StackSpec.repo }}
initContainers:
- command:
- /git-sync
args:
- --repo={{ getNested $repo "url" }}
- --one-time=true
- --root=/etc/dataos/work
{{- if hasKey $repo "syncFlags" }}
{{- range $flag := getNested $repo "syncFlags" }}
- {{ $flag }}
{{- end }}
{{- end }}
image: 933570616564.dkr.ecr.us-west-2.amazonaws.com/released-images/git-sync:latest
imagePullPolicy: IfNotPresent
name: "{{.Name}}{{.Stamp}}-ic"
{{ if .ApplicationSpec.Resources -}}
resources:
{{toYaml .ApplicationSpec.Resources | indent 4}}
{{- end }}
volumeMounts:
- mountPath: /etc/dataos/work
name: workdir
envFrom:
- secretRef:
name: "{{.Name}}-{{.Type}}{{.Stamp}}-env"
{{ if .EnvironmentVarsFromSecret }}
{{- range $secName := .EnvironmentVarsFromSecret }}
- secretRef:
name: "{{$secName}}"
{{- end }}
{{- end }}
{{- end }}
container:
name: {{.Name}}{{.Stamp}}-main
image: "{{.Image}}"
imagePullPolicy: IfNotPresent
command:
- python
args:
- /app/nilus/main.py
envFrom:
- secretRef:
name: "{{.Name}}-{{.Type}}{{.Stamp}}-env"
{{ if .EnvironmentVarsFromSecret }}
{{- range $secName := .EnvironmentVarsFromSecret }}
- secretRef:
name: "{{$secName}}"
{{- end }}
{{- end }}
volumeMounts:
- mountPath: "{{.DataOsSecretMountPath}}"
name: dataos-secret-mount
readOnly: true
- name: workdir
mountPath: /etc/dataos/work
{{ if .HasConfigConfs }}
- name: dataos-config-mount
mountPath: "{{.DataOsConfigMountPath}}"
readOnly: true
{{- end }}
{{ if .ApplicationSpec.TempVolume -}}
- name: {{.Name}}-{{.Type}}{{.Stamp}}-tdm
mountPath: "{{.DataTempMountPath}}"
subPath: {{.Name}}{{.Stamp}}
{{- end }}
{{ if .ApplicationSpec.PersistentVolume -}}
- name: {{.Name}}-{{.Type}}{{.Stamp}}-pdm
mountPath: "{{.DataPersistentMountPath}}/{{.ApplicationSpec.PersistentVolume.Directory}}"
subPath: "{{.ApplicationSpec.PersistentVolume.Directory}}"
{{- end }}
{{ if .Volumes }}
{{- range $volume := .Volumes }}
- name: {{$volume.Name}}
mountPath: "{{$volume.MountPath}}"
readOnly: {{$volume.ReadOnly}}
{{ if $volume.SubPath }}
subPath: "{{$volume.SubPath}}"
{{- end }}
{{- end }}
{{- end }}
{{ if .ApplicationSpec.Resources -}}
resources:
{{toYaml .ApplicationSpec.Resources | indent 4}}
{{- end }}
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
{{ if .ApplicationSpec.EnvironmentVars }}
{{- range $conf, $value := .ApplicationSpec.EnvironmentVars }}
- name: {{$conf}}
value: {{ $value | quote }}
{{- end }}
{{- end}}
Replace all placeholder values—including image, registry, and URLs—with the appropriate environment-specific configurations.
Configuration Requirement
If the nilus-server deployed in a non-default workspace, the NILUS_SERVICE_URL environment variable in the Nilus Stack must be updated to target the appropriate workspace.
Expected Format:
Example: For a workspace named analytics, the environment variable should be:
Common Environment Variables in the Stack
-
DATAOS_WORK_DIR: Specifies the working directory used for runtime operations and intermediate files. -
BASE_CONFIG_FILE_DIR: Defines the directory containing configuration files mounted into the container. -
DEBEZIUM_PROPS_PATH: Indicates the location of Debezium properties used for CDC integration. -
NILUS_SERVICE_URL: Sets the internal service endpoint for communication with thenilus-server. Must follow the DataOS service URL format. -
NILUS_HOSTNAME_VERIFICATION_ENABLED: Controls hostname verification for internal service communication. Set to"false"to disable verification.