Attributes of Cluster manifest¶

Structure of Cluster-specific Section¶

cluster:
  compute: ${{query-default}} # mandatory
  runAsApiKey: ${{abcdefghijklmnopqrstuvwxyz}} # mandatory
  runaAsUser: ${{minerva-cluster}}
  maintenance:
    restartCron: ${{'13 1 */2 * *'}}
    timezone: ${{Asia/Kolkata}} # mandatory
    scalingCrons: 
      - cron: ${{'5/10 * * * *'}} # mandatory
        timezone: ${{Europe/Berlin}} # mandatory
        replicas: ${{2}}
        resources: 
          requests: 
            cpu: ${{800m}}
            memory: ${{1Gi}}
          limits: 
            cpu: ${{1000m}}
            memory: ${{2Gi}}    
  minerva: 
    replicas: ${{2}} # mandatory
    resources:
    secrets:
      - ${{mysecret}}
    depots: # mandatory
      - address: ${{dataos://lakehouse:default}} # mandatory
        properties:
          iceberg.file-format: ${{PARQUET}} 
          iceberg.compression-codec: ${{GZIP}} 
          hive.config.resources: ${{"/usr/trino/etc/catalog/core-site.xml"}}
        secrets: 
          - name: ${{newsecret}} # mandatory
            workspace: ${{curriculum}}
            key: ${{newone}}
            keys: 
              - ${{newone}}
              - ${{oldone}}
            allKeys: ${{true}}
            consumptionType: ${{envVars}}
    catalogs:
      - name: ${{cache}} # mandatory
        type: ${{memory}} # mandatory
        properties: 
          memory.max-data-per-node: ${{"128MB"}} 
        secrets: 
          - name: ${{newsecret}} # mandatory
            workspace: ${{curriculum}}
            key: ${{newone}}
            keys: 
              - ${{newone}}
              - ${{oldone}}
            allKeys: ${{true}}
            consumptionType: ${{envVars}}
    debug:
      logLevel: ${{INFO}}
      trinoLogLevel: ${{ERROR}}

    coordinatorEnvs:
      ${{alpha: beta}} 
    workerEnvs:
      ${{gamma: sigma}} 
    overrideDefaultEnvs: true
    spillOverVolume: twenty
    selector:
      users: # mandatory
        - ${{"**"}}
      tags: # mandatory
        - ${{alpha}}
        - ${{beta}}
      sources: # mandatory
        - ${{scanner/**}}
        - ${{flare/**}}
      match: ${{''}} # mandatory
      priority: ${{'10'}} # mandatory

  nats:
    replicas: ${{5}} # mandatory
    volumeType: ${{newone}} # mandatory
    volumeSize: ${{30mi}} # mandatory
    maxConnections: ${{4}} # mandatory
    roles: # mandatory
      - name: ${{newrole}} # mandatory
        permissions: 
          publish: # mandatory
            - ${{alpha}}
            - ${{beta}}
          subscribe: # mandatory
            - ${{newone}}
            - ${{develop}}
          allow_responses: ${{true}}
  jupyterHub:
    ingress: 
      enabled: ${{true}}
      path: ${{/strip/}}
      stripPath: ${{false}}
      noAuthentication: ${{true}}
      appDetailSpec: ${{random}}
      apiDetailSpec: ${{random}}
    oidcConfig: # mandatory
      clientId: ${{hello}} # mandatory
      clientSecret: ${{delta}} # mandatory
    storageClass: ${{alpha}} # mandatory
    singleUserConfig: # mandatory
      volumeCapacity: ${{alpha}} # mandatory

Structure of Cluster-specific Section

Configuration Attributes¶

`cluster`¶

Description: the cluster mapping/section defines configurations for the Cluster Resource.

Data Type	Requirement	Default Value	Possible Value
mapping	mandatory	none	none

Example Usage:

class="linenos" data-linenos=" 1 ">cluster: compute: query-default runAsUser: minerva-cluster minerva: selector: users: -"**" sources: - scanner/** - flare/** replicas: 2 match: '' priority: '10' runAsApiKey: dataos apikey runAsUser: iamgroot resources: limits: cpu: 4000m memory: 8Gi requests: cpu: 1200m memory: 2Gi debug: logLevel: INFO trinoLogLevel: ERROR depots: - address: dataos://lakehouse:default properties: iceberg.file-format: PARQUET iceberg.compression-codec: GZIP hive.config.resources: "/usr/trino/etc/catalog/core-site.xml" - address: dataos://bqdepot:default catalogs: - name: cache type: memory properties: memory.max-data-per-node: "128MB"

`compute`¶

Description: the compute attribute specifies the name of the Compute Resource-instance referred by the Cluster.

Data Type	Requirement	Default Value	Possible Value
string	mandatory	query-default	any valid query-type Compute Resource-instance name

Example Usage:

cluster:
  compute: query-default

`runAsApiKey`¶

Description: the runAsApiKey attribute allows a user to assume the identity of another user through the provision of the latter's API key.

Data Type	Requirement	Default Value	Possible Value
string	mandatory	abcdefghijklmnopqrstuvwxyz	any valid DataOS user API key

Additional Details: The apikey can be obtained by executing the following command from the CLI:

dataos-ctl user apikey get

In case no apikey is available, the below command can be run to create a new apikey

dataos-ctl user apikey create -n ${{name of the apikey}} -d ${{duration for the apikey to live}}

Example Usage:

cluster:
  runAsApiKey: abcdefghijklmnopqrstuvwxyz

`runAsUser`¶

Description: when the runAsUser attribute is configured with the UserID of the use-case assignee, it grants the authority to perform operations on behalf of that user.

Data Type	Requirement	Default Value	Possible Value
string	optional	user-id of the user	user-id of the use-case assignee

Example Usage:

cluster:
  runAsUser: iamgroot

`maintenance`¶

Available in DataOS CLI Version 2.8.2 and DataOS Version 1.10.41

Description: The maintenance section provides a set of Cluster maintenance-related configurations that assist with various operator activities that need to be simplified and automated by Poros, the DataOS orchestrator. The Cluster maintenance features are invoked on a cron schedule. This triggers a restart or a scale which is very specific to the Cluster in purview.

Data Type	Requirement	Default Value	Possible Value
mapping	optional	none	none

Example Usage:

cluster:
  maintenance:
    restartCron: '13 1 */2 * *'
    timezone: Europe/Berlin # mandatory
    scalingCrons: 
      - cron: '5/10 * * * *' # mandatory
        timezone: Europe/Berlin # mandatory
        replicas: 2
        resources: 
          requests: 
            cpu: 800m
            memory: 1Gi
          limits: 
            cpu: 1000m
            memory: 2Gi

`restartCron`¶

Description: The restartCron attribute specifies the cron schedule for cluster restart. Poros, the DataOS orchestrator will restart the Cluster based on the specified schedule.

Data Type	Requirement	Default Value	Possible Value
string	mandatory	none	any valid cron expression

Example Usage:
- To restart the Cluster at 1:13am every other day, specify.

cluster:
  maintenance:
    restartCron: '13 1 */2 * *'

`timezone`¶

Description: The timezone attribute specifies the Cluster's timezone.

Data Type	Requirement	Default Value	Possible Value
string	mandatory	none	any valid timezone from the tz database

Example Usage:

cluster:
  maintenance:
    timezone: Asia/Kolkata

`scalingCrons`¶

Description: The scalingCrons attribute defines configurations for scaling the Cluster. Poros can horizontally and/or vertically scale the Cluster based on the provided configuration.

Data Type	Requirement	Default Value	Possible Value
mapping	optional	none	none

Each scaling cron includes the following attributes:

cron: The cron schedule for the job.
timezone: The timezone for the job.
replicas: The number of replicas.
resources: Resource specifications for the job, including requests and limits for CPU and memory.

Additional Information: A scalingCron overrides the default provided replicas and/or resources in a cluster like Minerva while in an "active" cron window. When a cron schedule is triggered, the supplied replicas and resources are put into effect until another cron schedule occurs. To clear an active scalingCron, clear out the scalingCrons section and apply the Resource again.
Example Usage:

Horizontal Scaling: To scale the Cluster horizontally every 5 minutes, specify.

cluster:
  maintenance:
    scalingCrons:
    - cron: '5/10 * * * *'
      timezone: Europe/Berlin
      replicas: 3
    - cron: '10/10 * * * *'
      timezone: Europe/Berlin
      replicas: 0

Vertical Scaling: To scale the Cluster vertically every 5 minutes, specify the following attributes/fields.

cluster:
  maintenance:
    scalingCrons:
    - cron: '5/10 * * * *'
      timezone: Europe/Berlin
      resources:
        limits:
          cpu: 1000m
          memory: 2Gi
        requests:
          cpu: 800m
          memory: 1Gi
    - cron: '10/10 * * * *'
      timezone: Europe/Berlin
      resources:
        limits:
          cpu: 3000m
          memory: 7Gi
        requests:
          cpu: 1500m
          memory: 3Gi

`cron`¶

Description: specifies the cron schedule for scaling tasks in the Cluster.

Data Type	Requirement	Default Value	Possible Value
string	optional	none	any valid cron expression

Example Usage:

cluster:
  maintenance:
    scalingCrons:
      - cron: '5/10 * * * *'

`replicas`¶

Description: specifies the number of replicas for scaling tasks in the Cluster.

Data Type	Requirement	Default Value	Possible Value
integer	mandatory	1	1-4

Example Usage:

cluster:
  maintenance:
    scalingCrons:
      - replicas: 3

`resources`¶

Description: resource allocation of CPU and Memory configuration for the Cluster.

Data Type	Requirement	Default Value	Possible Value
mapping	optional	none	none

Example Usage:

cluster:
  maintenance:
    scalingCrons:
      resources:
        limits:
          cpu: 1000m
          memory: 2Gi
        requests:
          cpu: 800m
          memory: 1Gi

`limits`¶

Description: specifies the resource limits for CPU and memory for the specific Cluster.

Data Type	Requirement	Default Value	Possible Value
mapping	optional	none	none

Example Usage:

cluster:
  maintenance:
    scalingCrons:
      resources:
        limits:
          cpu: 1000m
          memory: 2Gi

`requests`¶

Description: Specifies the resource requests for the cluster.

Data Type	Requirement	Default Value	Possible Value
mapping	optional	none	none

Example Usage:

cluster:
  maintenance:
    scalingCrons:
      resources:
        requests:
          cpu: 800m
          memory: 1Gi

cpu

Description: specifies the CPU resource configuration for the Cluster.

Data Type	Requirement	Default Value	Possible Value
string	optional	requests: 100m, limits: 400m	cpu units in milliCPU(m) or CPU Core

Example Usage:

cluster:
  maintenance:
    scalingCrons:
      resources:
        limits:
          cpu: 1000m

memory

Description: specifies the requested memory for scaling tasks in the Cluster.

Data Type	Requirement	Default Value	Possible Value
string	optional	requests: 100Mi, limits: 400Mi	memory in Mebibytes(Mi) or Gibibytes(Gi)

Example Usage:

cluster:
  maintenance:
    scalingCrons:
      resources:
        limits:
          cpu: 1000m
          memory: 2Gi

`minerva`¶

Description: The minerva attribute defines configurations for the Minerva Cluster.

Data Type	Requirement	Default Value	Possible Value
mapping	mandatory	none	none

Example Usage:

cluster:
  minerva: 
    replicas: 2 # mandatory
    resources:
    secrets:
      - mysecret
    depots: # mandatory
      - address: dataos://lakehouse:default # mandatory
        properties:
          iceberg.file-format: PARQUET 
          iceberg.compression-codec: GZIP 
          hive.config.resources: "/usr/trino/etc/catalog/core-site.xml"

`replicas`¶

Description: The replicas attribute specifies the number of Minerva Cluster replicas.

Data Type	Requirement	Default Value	Possible Value
string	mandatory	1	any valid positive integer

Example Usage:

cluster:
  minerva:
    replicas: 2

`secrets`¶

Description: The secrets attribute is a list of secrets referred by Minerva Cluster.

Example Usage:

cluster:
  minerva:
    secrets:
      - mysecret

`depots`¶

Description: The depots attribute is a list of depots configurations. Its a specification of sources to be queried. This includes only those sources on which a depot can be created and support querying from Minerva Cluster.

Data Type	Requirement	Default Value	Possible Value
list of mappings	optional	none	none

Each depot configuration comprises of the following attributes:

address: The depot's address.
properties: Properties specific to the depot.
secrets: List of Secret Resource referred by the depot.

Example Usage:

cluster:
  minerva:
    depots:
      - address: dataos://lakehouse:default
        properties:
          iceberg.file-format: PARQUET
          iceberg.compression-codec: GZIP
          hive.config.resources: "/usr/trino/etc/catalog/core-site.xml"
        secrets:
          - name: newsecret
            workspace: curriculum
            key: newone
            keys:
              - newone
              - oldone
            allKeys: true
            consumptionType: envVars

`address`¶

Description: specifies the address for a depot

Data Type	Requirement	Default Value	Possible Value
string	optional	none	valid depot udl address

Example Usage:

cluster:
  minerva:
    depots:
      - address: dataos://lakehouse:default

`properties`¶

Description: additional properties for a depot

Data Type	Requirement	Default Value	Possible Value
mapping	optional	none	none

Example Usage:

cluster:
  minerva:
    depots:
      - properties:
          iceberg.file-format: PARQUET
          iceberg.compression-codec: GZIP
          hive.config.resources: "/usr/trino/etc/catalog/core-site.xml"

`secrets`¶

Description: Secret Resource referred by the depot/catalog.

Additional Information: For more information, refer to the link Secrets

`catalogs`¶

Description: The catalogs attribute for specification of sources in scenarios where it is not possible to create a depot, but a Trino connector is available and supported for the source.

Data Type	Requirement	Default Value	Possible Value
list of mappings	optional	none	none

Example Usage:

cluster:
  minerva:
    catalogs:
      - name: cache
        type: memory
        properties:
          memory.max-data-per-node: "128MB"

Each catalog configuration includes the following attributes:

name: The catalog name.
type: The catalog type.
properties: Catalog-specific properties.
secrets: List of secrets used by the catalog.

Example Usage:

cluster:
  minerva:
    catalogs:
      - name: cache
        type: memory
        properties:
          memory.max-data-per-node: "128MB"
        secrets:
          - name: newsecret
            workspace: curriculum
            key: newone
            keys:
              - newone
              - oldone
            allKeys: true
            consumptionType: envVars

`name`¶

Description: specifies the name of a catalog

Data Type	Requirement	Default Value	Possible Value
string	optional	none	any valid string

Example Usage:

cluster:
  minerva:
    catalogs:
      - name: cache

`type`¶

Description: specifies the type of a catalog

Data Type	Requirement	Default Value	Possible Value
string	optional	none	View the list of all possible catalog types here

Example Usage:

cluster:
  minerva:
    catalogs:
      - type: memory

`properties`¶

Description: additional properties for a catalog

Data Type	Requirement	Default Value	Possible Value
mapping	optional	none	valid connector properties

Example Usage:

cluster:
  minerva:
    catalogs:
      - properties:
          memory.max-data-per-node: "128MB"

`debug`¶

Description: The debug section includes debug-related configurations for Minerva.

Data Type	Requirement	Default Value	Possible Value
mapping	mandatory	none	none

Example Usage:

debug:
  logLevel: INFO
  trinoLogLevel: ERROR

`logLevel`¶

Description: The logLevel attribute specifies the log level for Minerva's logs.

Data Type	Requirement	Default Value	Possible Value
string	optional	INFO	INFO/DEBUG/ERROR

Example Usage:

cluster:
  minerva:
    debug:
      logLevel: INFO

`trinoLogLevel`¶

Description: The trinoLogLevel attribute specifies the log level for Trino logs within Minerva.

Data Type	Requirement	Default Value	Possible Value
string	optional	INFO	INFO/DEBUG/ERROR

Example Usage:

cluster:
  minerva:
    debug:
      trinoLogLevel: ERROR

`coordinatorEnvs`¶

Description: The coordinatorEnvs section includes environment variables for the coordinator node.

Example Usage:

cluster:
  minerva:
    coordinatorEnvs:
      alpha: beta

`workerEnvs`¶

Description: The workerEnvs section includes environment variables for worker nodes.

Example Usage:

cluster:
  minerva:
    workerEnvs:
      gamma: sigma

`overrideDefaultEnvs`¶

Description: The overrideDefaultEnvs attribute specifies whether to override default environment variables.

Data Type	Requirement	Default Value	Possible Value
boolean	optional	true	true or false

Example Usage:

cluster:
  minerva:
    overrideDefaultEnvs: true

`spillOverVolume`¶

Description: The spillOverVolume attribute specifies the spill-over volume.

Data Type	Requirement	Default Value	Possible Value
string	optional	twenty	any valid string

Example Usage:

cluster:
  minerva:
    spillOverVolume: twenty

`selector`¶

Description: The selector section defines a selector for users, tags, sources, match, and priority.

Data Type	Requirement	Default Value	Possible Value
mapping	mandatory	none	none

Example Usage:

cluster:
  minerva:
    selector:
      users:
        - "**"
      sources:
        - scanner/**
        - flare/**

`users`¶

Description: the users attribute specifies a user identified by a tag or regex patterns. They can also be a group of tags defined as a list.

Data Type	Requirement	Default Value	Possible Value
list of strings	mandatory	none	a valid subset of all available users within DataOS

Example Usage:

cluster:
  minerva:
    selector:
      users:
        - "**"

`tags`¶

Description: The tags attribute specifies a list of tags. The cluster is accessible exclusively to users who possess specific tags.

Data Type	Requirement	Default Value	Possible Value
list of strings	optional	none	any valid tag or pattern

Additional Information: Multiple users can be specified using AND/OR Logical Rules. To know more, click here.

Example Usage:

cluster:
  minerva:
    selector:
      tags:
        - alpha
        - beta

`sources`¶

Description: the sources attribute specifies sources that can redirect queries to Cluster.

Data Type	Requirement	Default Value	Possible Value
list of strings	mandatory	none	list of strings representing source. For all sources, specify “**”.

Example Usage:

cluster:
  minerva:
    selector:
      sources:
        - scanner/**
        - flare/**

`match`¶

Description: The match attribute specifies the match condition.

Data Type	Requirement	Default Value	Possible Value
string	mandatory	any	any/all

Additional Information: - any - must match at least one tag - all - must match all tags

Example Usage:

cluster:
  minerva:
    selector:
      match: any

`priority`¶

Description: The priority attribute specifies the priority level. Workloads will be redirected to Cluster with a lower priority level (inverse relationship).

Data Type	Requirement	Default Value	Possible Value
integer	mandatory	10	any value between 1-5000

Example Usage:

cluster:
  minerva:
    selector:
      priority: 10

`nats`¶

Description: The nats section defines configurations for NATS Cluster.

Data Type	Requirement	Default Value	Possible Value
mapping	mandatory	none	none

Example Usage:

cluster:
  nats:
    replicas: 5 
    volumeType: newone 
    volumeSize: 30mi 
    maxConnections: 4 
    roles: 
      - name: newrole 
        permissions: 
          publish: 
            - alpha
            - beta
          subscribe: 
            - newone
            - develop
          allow_responses: true

`replicas`¶

Description: The replicas attribute specifies the number of NATS Cluster replicas.

Data Type	Requirement	Default Value	Possible Value
integer	mandatory	1	positive integer

Example Usage:

cluster:
  nats:
    replicas: 5

`volumeType`¶

Description: The volumeType attribute specifies the volume type for NATS.

Data Type	Requirement	Default Value	Possible Value
string	mandatory	none	valid string

Example Usage:

cluster:
  nats:
    volumeType: newone

`volumeSize`¶

Description: The volumeSize attribute specifies the volume size for NATS.

Data Type	Requirement	Default Value	Possible Value
string	mandatory	none	valid string

Example Usage:

cluster:
  nats:
    volumeSize: 30mi

`maxConnections`¶

Description: The maxConnections attribute specifies the maximum number of NATS connections.

Data Type	Requirement	Default Value	Possible Value
integer	mandatory	1	positive integer

Example Usage:

cluster:
  nats:
    maxConnections: 2

`roles`¶

Description: The roles attribute defines roles and permissions for NATS.

Each role includes the following attributes:

name: The role name.
permissions: The permissions for the role, including publish, subscribe, and allow_responses.

Example Usage:

cluster:
  nats:
    roles:
      - name: newrole
        permissions:
          publish:
            - alpha
            - beta
          subscribe:
            - newone
            - develop
        allow_responses: true

`jupyterHub`¶

Description: The jupyterHub section defines configurations for JupyterHub.

Data Type	Requirement	Default Value	Possible Value
mapping	mandatory	none	none

Example Usage:

cluster:
  jupyterHub:
    ingress: 
      enabled: true
      path: /strip/
      stripPath: false
      noAuthentication: true
      appDetailSpec: random
      apiDetailSpec: random
    oidcConfig: 
      clientId: hello 
      clientSecret: delta
    storageClass: alpha
    singleUserConfig:
      volumeCapacity: alpha

`ingress`¶

Description: The ingress section specifies configurations for the JupyterHub ingress.

Data Type	Requirement	Default Value	Possible Value
`enabled`	boolean	optional	true
`path`	string	optional	none
`stripPath`	boolean	optional	false
`noAuthentication`	string	optional	true
`appDetailSpec`	string	optional	random

Example Usage:

cluster:
  jupyterHub:
    ingress:
      enabled: true
      path: /strip/
      stripPath: false
      noAuthentication: true
      appDetailSpec: {}

`oidcConfig`¶

Description: The oidcConfig section specifies OIDC (OpenID Connect) configurations for JupyterHub.

Attribute	Data Type	Requirement	Default Value	Possible Value
`clientId`	string	mandatory	none	valid string
`clientSecret`	string	mandatory	none	valid string

Example Usage:

cluster:
  jupyterHub:
    oidcConfig:
      clientId: hello
      clientSecret: delta

`storageClass`¶

Description: The storageClass attribute specifies the storage class for JupyterHub.

Data Type	Requirement	Default Value	Possible Value
string	mandatory	none	valid string

Example Usage:

cluster:
  jupyterHub:
    storageClass: alpha

`singleUserConfig`¶

Description: The singleUserConfig section specifies configurations for single users in JupyterHub.

Data Type	Requirement	Default Value	Possible Value
mapping	mandatory	none	none

Example Usage:

cluster:
  jupyterHub:
    singleUserConfig:
      volumeCapacity: alpha

`volumeCapacity`¶

Description: The volumeCapacity attribute specifies the volume capacity for single users.

Data Type	Requirement	Default Value	Possible Value
string	mandatory	alpha	valid string

Example Usage:

cluster:
  jupyterHub:
    singleUserConfig:
      volumeCapacity: alpha

Attributes of Cluster manifest¶

Structure of Cluster-specific Section¶

Configuration Attributes¶

cluster¶

compute¶

runAsApiKey¶

runAsUser¶

maintenance¶

restartCron¶

timezone¶

scalingCrons¶

cron¶

replicas¶

resources¶

limits¶

requests¶

minerva¶

replicas¶

secrets¶

depots¶

address¶

properties¶

secrets¶

catalogs¶

name¶

type¶

properties¶

debug¶

logLevel¶

trinoLogLevel¶

coordinatorEnvs¶

workerEnvs¶

overrideDefaultEnvs¶

spillOverVolume¶

selector¶

users¶

tags¶

sources¶

match¶

priority¶

nats¶

replicas¶

volumeType¶

volumeSize¶

maxConnections¶

roles¶

jupyterHub¶

ingress¶

oidcConfig¶

storageClass¶

singleUserConfig¶

volumeCapacity¶

`cluster`¶

`compute`¶

`runAsApiKey`¶

`runAsUser`¶

`maintenance`¶

`restartCron`¶

`timezone`¶

`scalingCrons`¶

`cron`¶

`replicas`¶

`resources`¶

`limits`¶

`requests`¶

`minerva`¶

`replicas`¶

`secrets`¶

`depots`¶

`address`¶

`properties`¶

`secrets`¶

`catalogs`¶

`name`¶

`type`¶

`properties`¶

`debug`¶

`logLevel`¶

`trinoLogLevel`¶

`coordinatorEnvs`¶

`workerEnvs`¶

`overrideDefaultEnvs`¶

`spillOverVolume`¶

`selector`¶

`users`¶

`tags`¶

`sources`¶

`match`¶

`priority`¶

`nats`¶

`replicas`¶

`volumeType`¶

`volumeSize`¶

`maxConnections`¶

`roles`¶

`jupyterHub`¶

`ingress`¶

`oidcConfig`¶

`storageClass`¶

`singleUserConfig`¶

`volumeCapacity`¶