Bundle¶
A Bundle Resource within DataOS serves as a declarative and standardized mechanism for deploying a collection of Resources, data products, or applications in a single operation. It empowers data developers with the capability to programmatically orchestrate the deployment, scheduling, creation, and dismantling of code and infrastructure resources linked to these data products and applications in a unified manner.
As implied by its name, the Bundle Resource aggregates various DataOS Resources into a flattened directed acyclic graph (DAG). Within this structure, each node represents a distinct DataOS Resource, interconnected through dependency relationships.
Why use a Bundle?¶
End-to-End Definition¶
The manifest file of the Bundle can be used to define the logical quantum for a data product - it encompasses the ‘code’ part of a ‘Data Product’. It collectively provides a comprehensive definition of a data product or application. This representation simplifies the application of software engineering best practices, including source control, code review, testing, and continuous integration/continuous deployment (CI/CD) processes.
Streamlined Deployment¶
The Bundle eliminates fragmented deployment and validation procedures enabling robust configuration management across multiple Workspaces and cloud environments.
Structure of a Bundle YAML manifest¶
The YAML manifest outlined below details the structure of a Bundle:
# Resource meta section
name: ${{my-bundle}}
version: v1alpha
type: bundle
layer: user
tags:
- ${{dataos:type:resource}}
- ${{dataos:resource:bundle}}
description: ${{this bundle resource is for a data product}}
owner: ${{iamgroot}}
# Bundle-specific section
bundle:
# Bundle Schedule section
schedule:
initialState: ${{initial state}}
timezone: ${{time zone}}
create:
cron: ${{cron expression}}
delete:
cron: ${{cron expression}}
# Bundle Workspaces section
workspaces:
- name: ${{bundlespace}} # Workspace name (mandatory)
description: ${{this is a bundle workspace}} # Workspace description (optional)
tags: # Workspace tags (optional)
- ${{bundle}}
- ${{myworkspace}}
labels: # Workspace labels (optional)
${{purpose: testing}}
layer: user # Workspace layer (mandatory)
# Bundle Resources section
resources:
- id: ${{bundle-scanner}} # Resource ID (mandatory)
workspace: ${{bundlespace}} # Workspace (optional)
spec: # Resource spec (mandatory)
${{resource spec manifest}}
file: ${{/home/Desktop/bundle-scanner.yaml}} # Resource spec file (optional)
dependencies: # Resource dependency (optional)
- ${{bundle-depot}}
dependencyConditions: # Resource dependency conditions (optional)
- resourceId: ${{bundle-depot}} # Resource ID (mandatory)
status: # Status dependency condition (optional)
is: # Status is (optional)
- ${{active}}
contains: # Status contains (optional)
- ${{activ}}
runtime: # Runtime dependency condition (optional)
is: # Runtime is (optional)
- ${{running}}
contains: # Runtime contains (optional)
- ${{run}}
# Additional properties section
properties:
${{additional properties}}
# Manage As User
manageAsUser: ${{iamgroot}}
How to create a Bundle?¶
Data developers can create a Bundle Resource by creating a YAML manifest and applying it via the DataOS CLI.
Create a Bundle YAML manifest¶
A Bundle Resource YAML manifest can be structurally broken down into following sections:
Configure the Resource meta section¶
In DataOS, a Bundle is categorized as a Resource-type. The Resource meta section within the YAML manifest encompasses attributes universally applicable to all Resource-types. The provided YAML codeblock elucidates the requisite attributes for this section:
# Resource meta section
name: ${{my-bundle}} # Resource name (mandatory)
version: v1beta # Manifest version (mandatory)
type: bundle # Resource-type (mandatory)
tags: # Resource Tags (optional)
- ${{dataos:type:resource}}
- ${{dataos:resource:bundle}}
description: ${{This is a bundle yaml manifest}} # Resource Description (optional)
owner: ${{iamgroot}} # Resource Owner (optional)
bundle: # Bundle-specific section mapping(mandatory)
${{Attributes of Bundle-specific section}}
For more information, refer to the Attributes of Resource Meta Section.
Configure the Bundle-specific section¶
The Bundle-specific section contains attributes specific to the Bundle Resource. This section comprises of four high-level sections:
- Schedule
- Workspaces
- Bundle Resources
- Additional Properties
Each of these sections should be appropriately configured when creating a Bundle YAML manifest. The high-level structure of the various separate sections within the Bundle-specific section is provided in the YAML below:
bundle:
schedule: # Bundle schedule section (optional)
${{attributes for scheduling the bundle}}
workspaces: # Bundle workspaces section (optional)
${{attributes specific to workspace configuration}}
resources: # Bundle resources section (mandatory)
${{attributes specific to bundle resources}}
properties: # Addtional Properties (optional)
${{additional properties}}
manageAsUser: ${{iamgroot}} # Manage As User (optional)
Attribute | Data Type | Default Value | Possible Value | Requirement |
---|---|---|---|---|
bundle |
mapping | none | none | mandatory |
schedule |
mapping | none | none | optional |
workspaces |
list of mappings | none | none | optional |
resources |
list of mappings | none | none | optional |
properties |
mapping | none | none | optional |
manageAsUser |
string | UserID of Owner | UserID of use case assignee | optional |
Bundle Schedule section
The Bundle Schedule section allows you to specify scheduling attributes for the Bundle Resource. You can schedule the creation or deletion of a Bundle at specific intervals using a cron-like schedule. The following YAML code block outlines the attributes specified in the Bundle Workspaces section:
bundle: # Bundle-specific section (mandatory)
schedule: # Bundle Schedule section (optional)
initialState: ${{create}} # Initial State of Bundle (mandatory)
timezone: ${{Asia/Kolkata}} # Time Zone (mandatory)
create: # Bundle creation cron (optional)
- cron: ${{'5 0 24 1 *'}}
delete: # Bundle deletion cron (optional)
- cron: ${{'25 0 24 1 *'}}
Refer to the table below for a summary of the attributes within the Bundle Workspaces section. For detailed information about each attribute, please refer to the respective links provided in the attribute column.
Attribute | Data Type | Default Value | Possible Value | Requirement |
---|---|---|---|---|
initialState |
string | none | create/delete | mandatory |
timezone |
string | none | valid timezone in the “Area/Location” format | mandatory |
create |
list of mappings | none | none | optional |
cron |
string | none | valid cron expression | mandatory |
delete |
list of mappings | none | none | optional |
Bundle Workspaces section
Within a Bundle, you have the option to deploy Workspace-level Resources in a new Workspace. To achieve this, you must declare the specifications of this new Workspace and reference it within the corresponding DataOS Workspace-level Resource within the Bundle Resources section.
The following YAML code block outlines the attributes specified in the Bundle Workspaces section:
bundle: # Bundle-specific section (mandatory)
workspaces: # Bundle Workspaces section (optional)
- name: ${{bundlespace}} # Workspace name (mandatory)
description: ${{this is a bundle workspace}} # Workspace description (optional)
tags: # Workspace tags (optional)
- ${{bundle}}
- ${{myworkspace}}
labels: # Workspace labels (optional)
${{purpose: testing}}
layer: ${{user}} # Workspace layer (mandatory)
Refer to the table below for a summary of the attributes within the Bundle Workspaces section. For detailed information about each attribute, please refer to the respective links provided in the attribute column.
Attribute | Data Type | Default Value | Possible Value | Requirement |
---|---|---|---|---|
name |
string | none | valid Workspace name | mandatory |
description |
string | none | any valid string | optional |
tags |
list of strings | none | any valid string | optional |
labels |
mapping | none | valid key-value pairs | optional |
layer |
string | none | user/system | mandatory |
Bundle Resources section
The Bundle Resources section allows you to define the Resources that make up the data product/application and their dependencies in the form of a flattened DAG. Each node within this DAG represents a Resource interconnected through a set of dependencies. Using the dependency and dependencyConditions, relationship and conditions can be specified such that resource only instantiates when the right dependency condition is met either the correct status and runtime, else it doesn’t. The following YAML code block outlines the attributes specified in the Bundle Workspaces section:
bundle: # Bundle-specific section (mandatory)
resources:
- id: ${{bundle-scanner}} # Resource ID (mandatory)
workspace: ${{bundlespace}} # Workspace (optional)
spec: # Resource spec (mandatory)
${{resource spec manifest}}
file: ${{/home/Desktop/bundle-scanner.yaml}} # Resource spec file (optional)
dependencies: # Resource dependency (optional)
- ${{bundle-depot}}
dependencyConditions: # Resource dependency conditions (optional)
- resourceId: ${{bundle-depot}} # Resource ID (mandatory)
status: # Status dependency condition (optional)
is: # Status is (optional)
- ${{active}}
contains: # Status contains (optional)
- ${{activ}}
runtime: # Runtime dependency condition (optional)
is: # Runtime is (optional)
- ${{running}}
contains: # Runtime contains (optional)
- ${{run}}
Refer to the table below for a summary of the attributes within the Bundle Workspaces section. For detailed information about each attribute, please refer to the respective links provided in the attribute column.
Attribute | Data Type | Default Value | Possible Value | Requirement |
---|---|---|---|---|
id |
string | none | valid string | mandatory |
workspace |
string | none | valid Workspace name | optional (Mandatory for Workspace-level Resources) |
spec |
mapping | none | valid Resource spec | optional |
file |
string | none | valid Resource config file path | optional |
dependencies |
list of strings | none | Resource ids within the DAG except the current one | optional |
dependencyConditions |
list of mappings | none | none | optional |
resourceId |
string | none | ID of dependent Resource | mandatory |
status |
mapping | none | none | optional |
is |
list of strings | none | none | optional |
contains |
list of strings | none | none | optional |
runtime |
mapping | none | none | optional |
Additional Properties section
The Additional Properties section lets you include any additional key-value properties relevant to the Bundle Resource. The following YAML code block outlines the attributes specified in the additional properties section:
bundle: # Bundle-specific section (mandatory)
properties: # Additional properties section (optional)
${{alpha: beta}}
${{gamma: sigma}}
Refer to the table below for a summary of the attributes within the additional properties section. For detailed information about each attribute, please refer to the respective links provided in the attribute column.
Attribute | Data Type | Default Value | Possible Value | Requirement |
---|---|---|---|---|
properties |
mapping | none | properties in the form of key-value pairs | optional |
Data developers can alter the customize the behaviour of Bundle Resources by configuring the sections and attributes as needed. For a detailed insights into the description and constraints of the attributes within the Bundle-specific section, please consult the 'Attributes of Bundle-specific Section' documentation page.
Sample Bundle YAML manifest
# RESOURCE META SECTION
name: alphabeta
version: v1beta
type: bundle
tags:
- dataproduct
- product
description: This bundle resource is for product data product.
layer: "user"
# BUNDLE-SPECIFIC SECTION
bundle:
# Bundle workspaces section
workspaces:
- name: testingspace
description: This workspace runs dataos bundle resource for demo
tags:
- dataproduct
- product
- bundleResource
labels:
"name": dataproductBundleResources
layer: user
# Bundle resources section
resources:
- id: depot
spec:
name: snowflakedepot01
version: v1
type: depot
tags:
- snowflake
- depot
layer: user
depot:
type: snowflake
description: snowflake depot
spec:
warehouse: DATAOS_WAREHOUSE
url: jk42400.europe-west4.gcp.snowflakecomputing.com
database: snowflake_sample_data
external: true
connectionSecret:
- acl: rw
type: key-value-properties
data:
username: XPLORERSDATA
password: Priyansh@2007
- id: scanner
spec:
version: v1
name: snowflakedepotscanner
type: workflow
tags:
- Scanner
title: Scan snowflake-depot
description: |
The purpose of this workflow is to scan snowflake and see if scanner works fine with a snowflake of depot.
workflow:
dag:
- name: scan-snowflake
title: Scan snowflake
description: |
The purpose of this job is to snowflake and see if scanner works fine with a snowflake type of depot.
tags:
- Scanner
spec:
stack: scanner:2.0
compute: runnable-default
# runAsUser: metis
stackSpec:
depot: snowflakedepot01
sourceConfig:
config:
schemaFilterPattern:
includes:
- ^tpch_sf10$
tableFilterPattern:
includes:
- customer
- nation
- region
workspace: testingspace
dependencies:
- depot
dependencyConditions:
- resourceId: depot
status:
is:
- active
Apply the Bundle YAML¶
After creating the Bundle YAML manifest, it's time to apply it to instantiate the Resource-instance in the DataOS environment. To apply the Bundle YAML file, utilize the apply
command.
dataos-ctl apply -f ${{yaml config file path}}
# Sample
dataos-ctl apply -f /home/Desktop/my-bundle.yaml
Verify Bundle Creation¶
To ensure that your Bundle has been successfully created, you can verify it in two ways:
Check the name of the newly created bundle in the list of bundles created by you:
dataos-ctl get -t bundle
# Expected Output
INFO[0000] 🔍 get...
INFO[0000] 🔍 get...complete
NAME | VERSION | TYPE | WORKSPACE | STATUS | RUNTIME | OWNER
----------------|---------|--------|-----------|--------|---------|----------------
my-bundle | v1beta | bundle | | active | | iamgroot
Alternatively, retrieve the list of all bundles created in your organization:
dataos-ctl get -t bundle -a
# Expected Output
INFO[0000] 🔍 get...
INFO[0000] 🔍 get...complete
NAME | VERSION | TYPE | WORKSPACE | STATUS | RUNTIME | OWNER
----------------|---------|--------|-----------|--------|---------|----------------
my-bundle | v1beta | bundle | | active | | iamgroot
bundle101 | v1beta | bundle | | active | | thor
You can also access the details of any created Bundle through the DataOS GUI in the Operations App.
Deleting Bundles¶
When a Bundle Resource is deleted, it triggers the removal of all resources associated with that Bundle, including Workspaces (if there are any specified within the Bundle definition).
Before deleting the Bundle, you must delete all workloads or resources that are dependent on it. This step ensures that there are no dependencies left that could cause issues during deletion. Once it's done, use the delete
command to remove the specific Bundle Resource-instance from the DataOS environment:
# METHOD 1
dataos-ctl delete -t bundle -n ${{name of bundle}}
# Sample
dataos-ctl delete -t bundle -n my-bundle
# METHOD 2
dataos-ctl delete -i "${{identifier string for a resource}}"
# Sample
dataos-ctl delete -i "my-bundle | v1beta | bundle | "
Bundle Templates¶
The Bundle templates serve as blueprints, defining the structure and configurations for various types of Bundles, making it easier for data developers to consistently deploy resources. To know more, refer to the link: Bundle Templates.
Bundle Command Reference¶
Here is a reference to the various commands related to managing Bundles in DataOS:
-
Applying a Bundle: Use the following command to apply a Bundle using a YAML configuration file:
-
Get Bundle Status: To retrieve the status of a specific Bundle, use the following command:
-
Get Status of all Bundles: To get the status of all Bundles within the current context, use this command:
-
Generate Bundle JSON Schema: To generate the JSON schema for a Bundle with a specific version (e.g., v1alpha), use the following command:
-
Get Bundle JSON Resource Schema: To retrieve the JSON schema for a Bundle Resource from Poros with a specific version (e.g., v1alpha), use the following command:
-
Delete Bundles: To delete a specific bundle, you can use the below command