How to create your own Stack?¶
Prerequisites¶
Understanding of Docker and Kubernetes¶
A foundational grasp of Docker and Kubernetes is essential for the creation of basic Stacks. The data developer is expected to proficiently containerize application code. For more advanced Stacks, involving custom Operators (like Flink Operators, Spark Operators, etc. just to name a few), expertise in Kubernetes Operators and Custom Resource Definitions (CRDs) is recommended.
JSON Schema Proficiency¶
A data developer must possess familiarity with JSON schema. JSON schemas play a crucial role in validating applied YAML manifest specs against predefined JSON schema within the Stack definition. This involves specifying attributes, their data types, and their required status in the JSON schema.
Go Text Templates Proficiency¶
A data developer must be familiar with how the Go text templates are defined, as these are integral while specifying YAML configuration file templates and Resource configuration templates within the Stack definition.
Building a Stack¶
The below points outline the high-level steps involved in the creation of a custom Stack Resource within the DataOS instance.
- Define the Stack image
- Create a Stack YAML manifest
a. Configure the Resource meta section
b. Configure the Stack-specific section - Apply the Stack YAML manifest
Definition of Stack Image¶
The initial and primary phase in the creation of a new Stack involves the development of its code. The Stack image serves as a comprehensive representation of all the functionalities and capabilities inherent in the Stack, along with delineating its interaction dynamics with its orchestrating Resource.
In order for a Stack to leverage the existing DataOS guarantees, such as depot and native governance, a data developer must create a layer on its code using the DataOS Software Development Kits (SDKs). For a detailed insight into this process, refer to the reference code implementations of the DataOS Soda-Python Stack, DataOS Steampipe Stack, and the DataOS DBT Stack.
Upon completion, containerize the code and transfer it to a container registry.
Creation of Stack YAML manifest¶
After the successful creation and upload of the Stack image to a container registry, the subsequent step involves creating a Stack YAML manifest.
The ensuing sections delineate the various sections of a Stack YAML.
Configure the Resource meta section¶
In DataOS, a Stack is categorized as a Resource-type. The Resource meta section within the YAML manifest encompasses attributes universally applicable to all Resource-types. The provided YAML codeblock elucidates the requisite attributes for this section:
name: ${{my-stack}}
version: v1alpha
type: stack
tags:
- ${{dataos:type:resource}}
- ${{dataos:resource:stack}}
description: ${{This is a sample stack yaml manifest}}
owner: ${{iamgroot}}
stack: # Stack-specific Section
${{Attributes of Stack-specific section}}
For detailed information on the attributes within the Resource meta section, please refer to Attributes of Resource Section.
Configure the Stack-specific section¶
The Stack-specific section delineates attributes exclusive to the Stack Resource. The provided code block illustrates a sample Stack-specific section:
stack:
# Stack meta section
name: benthosnew
version: "3.0"
reconciler: "stackManager"
# DataOS Address JqFilter
dataOsAddressJqFilters:
- .inputs[]
- .outputs[]
# Secret Projection
secretProjection:
type: "propFile"
# Image Specification section
image:
registry: docker.io
repository: rubiklabs
image: benthos-ds
tag: 0.8.15-dev
auth:
imagePullSecret: dataos-container-registry
# Command and Argument section
command:
- /opt/dataos/benthos-ds
arguments:
- run
- -c
- /etc/dataos/config/jobconfig.yaml
# Stack Spec Value JSON Schema
stackSpecValueSchema:
jsonSchema: |
{"$schema":"http://json-schema.org/draft-04/schema#","type":"object","properties":{"input":{"type":"object"},"metrics":{"type":"object"},"logger":{"type":"object"},"http":{"type":"object"},"pipeline":{"type":"object"},"output":{"type":"object"}},"required":["input"]}
# Orchestrator (Worker/Service/Workflow configuration)
serviceConfig:
configFileTemplate: |
jobconfig.yaml: |
{{ toYaml .ApplicationSpec.StackSpec | indent 2 }}
logger:
level: {{.ApplicationSpec.LogLevel}}
add_timestamp: true
format: json
static_fields:
'@service': {{.Name}}
http:
address: 0.0.0.0:{{.MetricPort}}
root_path: /dataos-benthos
debug_endpoints: false
metrics:
prometheus:
push_url: "http://prometheus-pushgateway.sentinel.svc.cluster.local:9091"
push_interval: "5s"
push_job_name: "{{ .Name }}-{{ .Type }}{{ .Stamp }}"
The Stack-specific section consist of the following sections that need to be further configured for a Stack Resource-instance to come to life.
Stack meta section
The Stack Meta Section delineates metadata attributes providing a distinctive identity to the Stack Resource for discoverability and version control. The following code block specifies the attributes to be declared within this section:
The attributes for the Stack meta section are summarized in the table below.
Attribute | Data Type | Default Value | Possible Value | Requirement |
---|---|---|---|---|
stack |
mapping | none | none | mandatory |
name |
string | none | any valid string | mandatory |
flavor |
string | none | any valid string | optional |
version |
string | none | valid stack version | mandatory |
reconciler |
string | none | StackManager/ LegacyStackManager |
mandatory |
For additional details, regarding the various attributes refer to the: Attributes of Stack Manifest.
DataOS Address JQ Filters specification
The DataOS Address JQ Filters specify which attributes within the YAML manifest need interpolation with DataOS Addresses for depot definition and credential retrieval.
Attribute | Data Type | Default Value | Possible Value | Requirement |
---|---|---|---|---|
dataOsAddressJqFilters | list of strings | none | valid strings | optional |
Secret Projection Type
The secretProjection type specifies the projection of secrets, such as depot and others.
Attribute | Data Type | Default Value | Possible Value | Requirement |
---|---|---|---|---|
secretProjection | mapping | none | none | mandatory |
type | string | none | none | optional |
Image specification
The image specification outlines attributes pertaining to the Docker image and associated details necessary for discovering and pulling it from the designated container registry.
image:
registry: docker.io
repository: personal
image: mystack
tag: 0.7.2-dev
auth:
imagePullSecret: mysecret
Attribute | Data Type | Default Value | Possible Value | Requirement |
---|---|---|---|---|
image |
mapping | none | none | optional |
registry |
string | none | valid container registry name | mandatory |
repository |
string | none | valid repository name | mandatory |
image |
string | none | valid image name | mandatory |
tag |
string | none | valid image tag | mandatory |
auth |
mapping | none | none | optional |
imagePullSecret |
string | none | valid imagePullSecret name | mandatory |
Command and Argument specification
The command and arguments for executing the Docker image are defined in this section. You can either provide the entire command within the command
attribute or split it into separate command
and argument
entries.
Attribute | Data Type | Default Value | Possible Value | Requirement |
---|---|---|---|---|
command |
list of strings | none | valid command | optional |
arguments |
list of strings | none | valid arguments | optional |
Environment Variable specification
This section details the environment variables essential for the execution of the Stack.
environmentVars:
PULSAR_TOPIC_ADDRESS: dataos://systemstreams:soda/quality_profile_results
RESOURCE_DIR_PATH: "/etc/dataos/resources"
Attribute | Data Type | Default Value | Possible Value | Requirement |
---|---|---|---|---|
environmentVars |
mapping | none | key-value pairs of environment variables | optional |
Ports Specification
The Ports section specifies the service and target ports, along with unique name assigned to the port.
Attribute | Data Type | Default Value | Possible Value | Requirement |
---|---|---|---|---|
ports |
mapping | none | none | optional |
name |
string | none | valid port name | mandatory |
servicePort |
integer | none | valid port number | mandatory |
targetPort |
integer | none | valid port number | mandatory |
StackSpec Value Schema section
The DataOS Resource API accepts JSON Schema format. Although slightly different from YAML, with file references inlined directly as base64 and jobs and steps also inlined, the overall structure remains consistent between YAML and JSON formats.
stackSpecValueSchema:
jsonSchema: |
{"$schema":"http://json-schema.org/draft-04/schema#","type":"object","properties":{"input":{"type":"object"},"metrics":{"type":"object"},"logger":{"type":"object"},"http":{"type":"object"},"pipeline":{"type":"object"},"output":{"type":"object"}},"required":["input"]}
Attribute | Data Type | Default Value | Possible Value | Requirement |
---|---|---|---|---|
stackSpecValueSchema |
mapping | none | none | optional |
jsonSchema |
string | none | valid json schema in string format | mandatory |
Orchestrator Configuration
A Stack encapsulates the logic pertaining to a specific task, while orchestrators like Workflow, Worker, or Service define how the Stack should be utilized for a particular use case. On applying these orchestrators the Kubernetes Resources of the respective Stack gets created. The YAML structure of Stacks is created at runtime, providing users the flexibility to decide the desired Stack structure and the information to include in the Stack definition. Poros, the DataOS Orchestrator, does not have hardcoded configurations; instead, it allows users to customize the specification according to their requirements.
When defining Stacks within the DataOS ecosystem, two primary approaches are either as Containerized Resources or Resource Dependent ones. Each approach serves different use cases and provides flexibility based on the complexity and customization needs of the user.
Containerized Resources
By specifying the Containerized Resources within the Orchestrator config, users adhere to the standard default template provided by the DataOS Orchestrator, Poros. This template, which is a basic YAML configuration for the pods, comes with a predefined structure and attributes. The generic template can be overriden by passing a custom definition in the containerResourceTemplate
attribute. Users only need to provide a configFileTemplate
that the Stack will use. This is suitable for use cases where the standard template provided by Poros is sufficient. For tasks like consuming data from an API, performing transformations, and writing data to databases, users can rely on the default template without the need for additional pod information.
Resource Dependent
Resource Dependent approach involves declaring custom Resources and interacting with external Operators. For instance, when using Spark Operator, the user defines a template understood by Spark Operator, and Poros interpolates values into the Stack Resource of Spark Operator. In this case users need to understand the Operator, its requirements, and what resources it expects as input. For example, Spark templates require specific inputs, and users must ensure their Stack definitions align with the expectations of the Spark Operator. Users may need to specify URLs and information for communication with different services, or APIs as environment variables. Poros may propagate this information to the Operator, allowing the application code to traverse or call these clients.
The choice between Containerized Resources and Resource Dependent depends on the use case and the level of customization required. Containerized Resources are suitable for standard tasks where the default template suffices. Resource Dependent is more appropriate when dealing with external Operators, custom Resources, and intricate configuration needs. Users can select the approach that aligns with their specific requirements and understanding of the DataOS ecosystem.
Workflow Configuration
workflowJobConfig:
configFileTemplate: ${{config file template}}
conatinerResourceTemplate: ${{container resource template}}
resourceTemplateConfig:
resourceTemplate: ${{resource template}}
successCondition: ${{success condition}}
failureCondition: ${{failure condition}}
Attribute | Data Type | Default Value | Possible Value | Requirement |
---|---|---|---|---|
workflowJobConfig |
mapping | none | none | optional |
configFileTemplate |
string | none | valid config file template | optional |
containerResourceTemplate |
string | none | valid container resource template | optional |
resourceTemplateConfig |
mapping | none | none | optional |
resourceTemplate |
string | none | valid resource template | mandatory |
successCondition |
string | none | valid success condition | mandatory |
failureCondition |
string | none | valid failure condition | mandatory |
To declare a Stack that will be orchestrated using a Workflow as a Containerized Resource, click on the link: Stack orchestrated via a Workflow declared using a config template. To declare a Stack using a custom Resource template that will be orchestrated using a Workflow, refer to the link: Stack orchestrated via a Workflow declared using a Resource template.
Worker Configuration
workerConfig:
configFileTemplate: ${{config file template}}
conatinerResourceTemplate: ${{container resource template}}
resourceTemplate: ${{resource template}}
Attribute | Data Type | Default Value | Possible Value | Requirement |
---|---|---|---|---|
workerConfig |
mapping | none | none | optional |
configFileTemplate |
string | none | valid config file template | optional |
containerResourceTemplate |
string | none | valid container Resource template | optional |
resourceTemplate |
string | none | valid resource template | optional |
To declare a Stack that will be orchestrated using a Worker as a Containerized Resource, refer to the link: Stack orchestrated via a Worker declared using a config template. To declare a Stack using a custom Resource template that will be orchestrated using a Worker, refer to the following link: Stack orchestrated via a Worker declared using a config template.
Service Configuration
serviceConfig:
configFileTemplate: ${{config file template}}
conatinerResourceTemplate: ${{container resource template}}
Attribute | Data Type | Default Value | Possible Value | Requirement |
---|---|---|---|---|
serviceConfig |
mapping | none | none | optional |
configFileTemplate |
string | none | valid config file template | optional |
containerResourceTemplate |
string | none | valid container Resource template | optional |
To take a look at a case scenario, refer to the link: Stack orchestrated by a Service declared using a config template.
Apply the Stack YAML manifest¶
Once the Stack YAML file is prepared, the apply
command can be utilized to create a Stack Resource within the DataOS environment.
Upon successful creation of a Stack Resource, CRUD operations can be performed on top of it, and it can be orchestrated by Resources such as Workflow, Worker, or Service, by specifying the Stack attribute within their respective YAMLs.
Verify Stack creation¶
To ensure that your Stack has been successfully created, you can verify it in two ways:
Check the name of the newly created Stack in the list of Stacks where you are named as the owner:
dataos-ctl get -t operator
# Expected Output
INFO[0000] 🔍 get...
INFO[0000] 🔍 get...complete
NAME | VERSION | TYPE | WORKSPACE | STATUS | RUNTIME | OWNER
---------------|---------|----------|-----------|--------|---------|-------------------
adf-operator | v1alpha | operator | | active | | iamgroot
Alternatively, retrieve the list of all Stacks created in your organization:
You can also access the details of any created Stacks through the DataOS GUI in the Operations App.
Attributes of Stack YAML manifest¶
This section outlines the various attributes associated with a Stack Resource, providing essential details for customization and configuration. To know more, navigate to the link: Attributes of Stack YAML manifest.