Attributes of Cluster manifest¶
Structure of Cluster-specific Section¶
cluster:
compute: ${{query-default}} # mandatory
runAsApiKey: ${{abcdefghijklmnopqrstuvwxyz}} # mandatory
runaAsUser: ${{minerva-cluster}}
maintenance:
restartCron: ${{'13 1 */2 * *'}}
timezone: ${{Asia/Kolkata}} # mandatory
scalingCrons:
- cron: ${{'5/10 * * * *'}} # mandatory
timezone: ${{Europe/Berlin}} # mandatory
replicas: ${{2}}
resources:
requests:
cpu: ${{800m}}
memory: ${{1Gi}}
limits:
cpu: ${{1000m}}
memory: ${{2Gi}}
minerva:
replicas: ${{2}} # mandatory
resources:
secrets:
- ${{mysecret}}
depots: # mandatory
- address: ${{dataos://icebase:default}} # mandatory
properties:
iceberg.file-format: ${{PARQUET}}
iceberg.compression-codec: ${{GZIP}}
hive.config.resources: ${{"/usr/trino/etc/catalog/core-site.xml"}}
secrets:
- name: ${{newsecret}} # mandatory
workspace: ${{curriculum}}
key: ${{newone}}
keys:
- ${{newone}}
- ${{oldone}}
allKeys: ${{true}}
consumptionType: ${{envVars}}
catalogs:
- name: ${{cache}} # mandatory
type: ${{memory}} # mandatory
properties:
memory.max-data-per-node: ${{"128MB"}}
secrets:
- name: ${{newsecret}} # mandatory
workspace: ${{curriculum}}
key: ${{newone}}
keys:
- ${{newone}}
- ${{oldone}}
allKeys: ${{true}}
consumptionType: ${{envVars}}
debug:
logLevel: ${{INFO}}
trinoLogLevel: ${{ERROR}}
coordinatorEnvs:
${{alpha: beta}}
workerEnvs:
${{gamma: sigma}}
overrideDefaultEnvs: true
spillOverVolume: twenty
selector:
users: # mandatory
- ${{"**"}}
tags: # mandatory
- ${{alpha}}
- ${{beta}}
sources: # mandatory
- ${{scanner/**}}
- ${{flare/**}}
match: ${{''}} # mandatory
priority: ${{'10'}} # mandatory
nats:
replicas: ${{5}} # mandatory
volumeType: ${{newone}} # mandatory
volumeSize: ${{30mi}} # mandatory
maxConnections: ${{4}} # mandatory
roles: # mandatory
- name: ${{newrole}} # mandatory
permissions:
publish: # mandatory
- ${{alpha}}
- ${{beta}}
subscribe: # mandatory
- ${{newone}}
- ${{develop}}
allow_responses: ${{true}}
jupyterHub:
ingress:
enabled: ${{true}}
path: ${{/strip/}}
stripPath: ${{false}}
noAuthentication: ${{true}}
appDetailSpec: ${{random}}
apiDetailSpec: ${{random}}
oidcConfig: # mandatory
clientId: ${{hellow}} # mandatory
clientSecret: ${{delta}} # mandatory
storageClass: ${{alpha}} # mandatory
singleUserConfig: # mandatory
volumeCapacity: ${{alpha}} # mandatory
Configuration Attributes¶
cluster
¶
Description: the cluster
mapping/section defines configurations for the Cluster Resource.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
mapping | mandatory | none | none |
Example Usage:
cluster:
compute: query-default
runAsUser: minerva-cluster
minerva:
selector:
users:
-"**"
sources:
- scanner/**
- flare/**
replicas: 2
match: ''
priority: '10'
runAsApiKey: dataos apikey
runAsUser: iamgroot
resources:
limits:
cpu: 4000m
memory: 8Gi
requests:
cpu: 1200m
memory: 2Gi
debug:
logLevel: INFO
trinoLogLevel: ERROR
depots:
- address: dataos://icebase:default
properties:
iceberg.file-format: PARQUET
iceberg.compression-codec: GZIP
hive.config.resources: "/usr/trino/etc/catalog/core-site.xml"
- address: dataos://bqdepot:default
catalogs:
- name: cache
type: memory
properties:
memory.max-data-per-node: "128MB"
compute
¶
Description: the compute
attribute specifies the name of the Compute Resource-instance referred by the Cluster.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | mandatory | query-default | any valid query-type Compute Resource-instance name |
Example Usage:
runAsApiKey
¶
Description: the runAsApiKey
attribute allows a user to assume the identity of another user through the provision of the latter's API key.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | mandatory | abcdefghijklmnopqrstuvwxyz | any valid DataOS user API key |
Additional Details: The apikey can be obtained by executing the following command from the CLI:
In case no apikey is available, the below command can be run to create a new apikey
Example Usage:
runAsUser
¶
Description: when the runAsUser
attribute is configured with the UserID of the use-case assignee, it grants the authority to perform operations on behalf of that user.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | optional | user-id of the user | user-id of the use-case assignee |
Example Usage:
maintenance
¶
Available in DataOS CLI Version 2.8.2 and DataOS Version 1.10.41
Description: The maintenance
section provides a set of Cluster maintenance-related configurations that assist with various operator activities that need to be simplified and automated by Poros, the DataOS orchestrator. The Cluster maintenance features are invoked on a cron
schedule. This triggers a restart or a scale which is very specific to the Cluster in purview.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
mapping | optional | none | none |
Example Usage:
cluster:
maintenance:
restartCron: '13 1 */2 * *'
timezone: Europe/Berlin # mandatory
scalingCrons:
- cron: '5/10 * * * *' # mandatory
timezone: Europe/Berlin # mandatory
replicas: 2
resources:
requests:
cpu: 800m
memory: 1Gi
limits:
cpu: 1000m
memory: 2Gi
restartCron
¶
Description: The restartCron
attribute specifies the cron schedule for cluster restart. Poros, the DataOS orchestrator will restart the Cluster based on the specified schedule.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | mandatory | none | any valid cron expression |
Example Usage:
- To restart the Cluster at 1:13am every other day, specify.
timezone
¶
Description: The timezone
attribute specifies the Cluster's timezone.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | mandatory | none | any valid timezone from the tz database |
Example Usage:
scalingCrons
¶
Description: The scalingCrons
attribute defines configurations for scaling the Cluster. Poros can horizontally and/or vertically scale the Cluster based on the provided configuration.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
mapping | optional | none | none |
Each scaling cron includes the following attributes:
cron
: The cron schedule for the job.timezone
: The timezone for the job.replicas
: The number of replicas.resources
: Resource specifications for the job, includingrequests
andlimits
for CPU and memory.
Additional Information: A scalingCron
overrides the default provided replicas
and/or resources
in a cluster like Minerva while in an "active" cron window. When a cron schedule is triggered, the supplied replicas and resources are put into effect until another cron schedule occurs. To clear an active scalingCron, clear out the scalingCrons
section and apply the Resource again.
Example Usage:
- Horizontal Scaling: To scale the Cluster horizontally every 5 minutes, specify.
- Vertical Scaling: To scale the Cluster vertically every 5 minutes, specify the following attributes/fields.
cron
¶
Description: specifies the cron schedule for scaling tasks in the Cluster.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | optional | none | any valid cron expression |
Example Usage:
replicas
¶
Description: specifies the number of replicas for scaling tasks in the Cluster.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
integer | mandatory | 1 | 1-4 |
Example Usage:
resources
¶
Description: resource allocation of CPU and Memory configuration for the Cluster.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
mapping | optional | none | none |
Example Usage:
cluster:
maintenance:
scalingCrons:
resources:
limits:
cpu: 1000m
memory: 2Gi
requests:
cpu: 800m
memory: 1Gi
limits
¶
Description: specifies the resource limits for CPU and memory for the specific Cluster.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
mapping | optional | none | none |
Example Usage:
requests
¶
Description: Specifies the resource requests for the cluster.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
mapping | optional | none | none |
Example Usage:
cpu
Description: specifies the CPU resource configuration for the Cluster.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | optional | requests: 100m, limits: 400m | cpu units in milliCPU(m) or CPU Core |
Example Usage:
memory
Description: specifies the requested memory for scaling tasks in the Cluster.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | optional | requests: 100Mi, limits: 400Mi | memory in Mebibytes(Mi) or Gibibytes(Gi) |
Example Usage:
minerva
¶
Description: The minerva
attribute defines configurations for the Minerva Cluster.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
mapping | mandatory | none | none |
Example Usage:
cluster:
minerva:
replicas: 2 # mandatory
resources:
secrets:
- mysecret
depots: # mandatory
- address: dataos://icebase:default # mandatory
properties:
iceberg.file-format: PARQUET
iceberg.compression-codec: GZIP
hive.config.resources: "/usr/trino/etc/catalog/core-site.xml"
replicas
¶
Description: The replicas
attribute specifies the number of Minerva Cluster replicas.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | mandatory | 1 | any valid postive integer |
Example Usage:
secrets
¶
Description: The secrets
attribute is a list of secrets referred by Minerva Cluster.
Example Usage:
depots
¶
Description: The depots
attribute is a list of depots configurations. Its a specification of sources to be queried. This includes only those sources on which a depot can be created and support querying from Minerva Cluster.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
list of mappings | optional | none | none |
Each depot configuration comprises of the following attributes:
address
: The depot's address.properties
: Properties specific to the depot.secrets
: List of Secret Resource referred by the depot.
Example Usage:
cluster:
minerva:
depots:
- address: dataos://icebase:default
properties:
iceberg.file-format: PARQUET
iceberg.compression-codec: GZIP
hive.config.resources: "/usr/trino/etc/catalog/core-site.xml"
secrets:
- name: newsecret
workspace: curriculum
key: newone
keys:
- newone
- oldone
allKeys: true
consumptionType: envVars
address
¶
Description: specifies the address for a depot
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | optional | none | valid depot udl address |
Example Usage:
properties
¶
Description: additional properties for a depot
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
mapping | optional | none | none |
Example Usage:
cluster:
minerva:
depots:
- properties:
iceberg.file-format: PARQUET
iceberg.compression-codec: GZIP
hive.config.resources: "/usr/trino/etc/catalog/core-site.xml"
secrets
¶
Description: Secret Resource referred by the depot/catalog.
Additional Information: For more information, refer to the link Secrets
catalogs
¶
Description: The catalogs
attribute for specification of sources in scenarios where it is not possible to create a depot, but a Trino connector is available and supported for the source.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
list of mappings | optional | none | none |
Example Usage:
cluster:
minerva:
catalogs:
- name: cache
type: memory
properties:
memory.max-data-per-node: "128MB"
Each catalog configuration includes the following attributes:
name
: The catalog name.type
: The catalog type.properties
: Catalog-specific properties.secrets
: List of secrets used by the catalog.
Example Usage:
cluster:
minerva:
catalogs:
- name: cache
type: memory
properties:
memory.max-data-per-node: "128MB"
secrets:
- name: newsecret
workspace: curriculum
key: newone
keys:
- newone
- oldone
allKeys: true
consumptionType: envVars
name
¶
Description: specifies the name of a catalog
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | optional | none | any valid string |
Example Usage:
type
¶
Description: specifies the type of a catalog
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | optional | none | View the list of all possible catalog types here |
Example Usage:
properties
¶
Description: additional properties for a catalog
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
mapping | optional | none | valid connector properties |
Example Usage:
debug
¶
Description: The debug
section includes debug-related configurations for Minerva.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
mapping | mandatory | none | none |
Example Usage:
logLevel
¶
Description: The logLevel
attribute specifies the log level for Minerva's logs.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | optional | INFO | INFO/DEBUG/ERROR |
Example Usage:
trinoLogLevel
¶
Description: The trinoLogLevel
attribute specifies the log level for Trino logs within Minerva.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | optional | INFO | INFO/DEBUG/ERROR |
Example Usage:
coordinatorEnvs
¶
Description: The coordinatorEnvs
section includes environment variables for the coordinator node.
Example Usage:
workerEnvs
¶
Description: The workerEnvs
section includes environment variables for worker nodes.
Example Usage:
overrideDefaultEnvs
¶
Description: The overrideDefaultEnvs
attribute specifies whether to override default environment variables.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
boolean | optional | true | true or false |
Example Usage:
spillOverVolume
¶
Description: The spillOverVolume
attribute specifies the spill-over volume.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | optional | twenty | any valid string |
Example Usage:
selector
¶
Description: The selector
section defines a selector for users, tags, sources, match, and priority.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
mapping | mandatory | none | none |
Example Usage:
users
¶
Description: the users
attribute specifies a user identified by a tag or regex patterns. They can also be a group of tags defined as a list.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
list of strings | mandatory | none | a valid subset of all available users within DataOS |
Example Usage:
tags
¶
Description: The tags
attribute specifies a list of tags. The cluster is accessible exclusively to users who possess specific tags.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
list of strings | optional | none | any valid tag or pattern |
Additional Information: Multiple users can be specified using AND/OR Logical Rules. To know more, click here.
Example Usage:
sources
¶
Description: the sources
attribute specifies sources that can redirect queries to Cluster.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
list of strings | mandatory | none | list of strings representing source. For all sources, specify “**”. |
Example Usage:
match
¶
Description: The match
attribute specifies the match condition.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | mandatory | any | any/all |
Additional Information:
- any
- must match at least one tag
- all
- must match all tags
Example Usage:
priority
¶
Description: The priority
attribute specifies the priority level. Workloads will be redirected to Cluster with a lower priority level (inverse relationship).
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
integer | mandatory | 10 | any value between 1-5000 |
Example Usage:
nats
¶
Description: The nats
section defines configurations for NATS Cluster.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
mapping | mandatory | none | none |
Example Usage:
cluster:
nats:
replicas: 5
volumeType: newone
volumeSize: 30mi
maxConnections: 4
roles:
- name: newrole
permissions:
publish:
- alpha
- beta
subscribe:
- newone
- develop
allow_responses: true
replicas
¶
Description: The replicas
attribute specifies the number of NATS Cluster replicas.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
integer | mandatory | 1 | positive integer |
Example Usage:
volumeType
¶
Description: The volumeType
attribute specifies the volume type for NATS.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | mandatory | none | valid string |
Example Usage:
volumeSize
¶
Description: The volumeSize
attribute specifies the volume size for NATS.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | mandatory | none | valid string |
Example Usage:
maxConnections
¶
Description: The maxConnections
attribute specifies the maximum number of NATS connections.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
integer | mandatory | 1 | positive integer |
Example Usage:
roles
¶
Description: The roles
attribute defines roles and permissions for NATS.
Each role includes the following attributes:
name
: The role name.permissions
: The permissions for the role, includingpublish
,subscribe
, andallow_responses
.
Example Usage:
cluster:
nats:
roles:
- name: newrole
permissions:
publish:
- alpha
- beta
subscribe:
- newone
- develop
allow_responses: true
jupyterHub
¶
Description: The jupyterHub
section defines configurations for JupyterHub.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
mapping | mandatory | none | none |
Example Usage:
cluster:
jupyterHub:
ingress:
enabled: true
path: /strip/
stripPath: false
noAuthentication: true
appDetailSpec: random
apiDetailSpec: random
oidcConfig:
clientId: hellow
clientSecret: delta
storageClass: alpha
singleUserConfig:
volumeCapacity: alpha
ingress
¶
Description: The ingress
section specifies configurations for the JupyterHub ingress.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
enabled |
boolean | optional | true |
path |
string | optional | none |
stripPath |
boolean | optional | false |
noAuthentication |
string | optional | true |
appDetailSpec |
string | optional | random |
Example Usage:
cluster:
jupyterHub:
ingress:
enabled: true
path: /strip/
stripPath: false
noAuthentication: true
appDetailSpec: {}
oidcConfig
¶
Description: The oidcConfig
section specifies OIDC (OpenID Connect) configurations for JupyterHub.
Attribute | Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|---|
clientId |
string | mandatory | none | valid string |
clientSecret |
string | mandatory | none | valid string |
Example Usage:
storageClass
¶
Description: The storageClass
attribute specifies the storage class for JupyterHub.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | mandatory | none | valid string |
Example Usage:
singleUserConfig
¶
Description: The singleUserConfig
section specifies configurations for single users in JupyterHub.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
mapping | mandatory | none | none |
Example Usage:
volumeCapacity
¶
Description: The volumeCapacity
attribute specifies the volume capacity for single users.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | mandatory | alpha | valid string |
Example Usage: