DataOS Documentation¶

In the rapidly evolving world of data, the need for robust, scalable, and efficient data products has never been more critical. DataOS is a platform that transforms how businesses create, manage, and leverage data products. DataOS empowers business teams to innovate by providing the tools and autonomy for data product development.

DataOS is an enterprise-grade data product platform that enables organizations to build, manage, and share data products. It provides the essential building blocks, data developers require to create powerful data products that drive significant business outcomes.

DataOS development is driven by core principles tailored to address specific user needs and challenges:

Consumption Ready Layer: DataOS streamlines data product consumption with context-aware discovery, secure exploration, reliable quality, and multi-interface activation through its self-serve architecture.
Data Product Lifecycle Management: DataOS is built to serve data product consumers, including data analysts/scientists and data app developers. It aims to capture the entire data product lifecycle, integrating data product consumers, owners, developers, and administrators under one roof. This holistic approach ensures seamless integration with tools that users are already familiar with.
Faster Time to Value: DataOS accelerates the development process, enabling enterprises to gain insights quickly. This reduces the time to value substantially, fueling targeted campaigns, personalizing customer journeys, and enhancing profits faster.
AI Ready: Leveraging AI agents, DataOS enhances the user experience by providing heuristic assistance that evolves based on user feedback and needs.
FinOps: DataOS provides real-time insights into resource utilization, enabling organizations to monitor and control cloud spending effectively. This strategic approach promotes shared responsibility across teams, drives significant cost savings, improves operational efficiency, and facilitates informed financial decisions.

DataOS continuously evolves to meet the real-world needs of data professionals. It significantly lowers the total cost of ownership by streamlining data operations through task automation, minimizing data movement, and simplifying maintenance.

Navigating Documentation¶

If you are new to DataOS, dataos.info is your go-to place for getting started with key technical concepts. Understanding these will help you develop data products efficiently with DataOS. For experienced developers looking to build solutions, how-to guides and reference docs are available to get you up to speed with data product creation. You'll find everything you need right here, whether starting or diving deep.

The documentation website features the top menu bar with the options like Getting Started, Data Products, Glossary, and Videos. A multi-level index is displayed on the left menu, allowing users to easily explore and dive deeper into specific topics within each category.

First Steps¶

The following sections in the top menu bar of the documentation will help you get started with DataOS.

Getting Started

Get hands-on and up to speed with DataOS, and familiarize yourself with its capabilities.

Learn more
Data Product

Learn to create, deploy & manage domain-specific data products at scale.

Explore more

Core Aspects¶

The following sections of the documentation, located on the left menu, provide detailed insights into the DataOS philosophy and its architecture.

Philosophy

Understand the philosophy behind DataOS, designed to simplify and abstract the complexities of traditional data infrastructure.

Read more
Architecture

Learn about the architecture of DataOS, built to democratize data, and accelerate data product creation.

See more

Understand Interfaces¶

The Interfaces section in the documentation introduces various ways to interact with DataOS services and components.

Command Line Interface (CLI)

The DataOS CLI enables efficient data operations, providing quick, flexible access to system functions for data engineers and administrators.

Learn more
Graphical User Interface (GUI)

The Graphical User Interface provides an intuitive and visually appealing method for interacting with DataOS and its components.

Learn more
Application Programming Interface (API & SDK)

The Application Programming Interface in DataOS enables seamless interaction with its core components and libraries, enabling the creation of diverse applications and services.

Learn more

Understand DataOS Resources¶

Resources section of the documentation will help you understand the primitives of DataOS that power the core functionalities necessary in any data stack. We have mapped these DataOS Resources to their functional role in the data stack:

Source Connectivity and Metadata management¶

This category includes DataOS Resources that facilitate the connection to various data sources, scan the metadata, and run quality checks & data profiling.

Depot

Connects various data sources to DataOS, abstracting underlying complexities.
Stacks

Acts as an execution engine and integrates new programming paradigms. Key Stacks in this category are Soda and Scanner.

Data Movement and Processing¶

This category includes DataOS Resources that facilitate the movement and transformation of data.

Batch Data¶

These DataOS Resources support batch data processing, enabling scheduled, large-scale data transformations and movements.

Workflow

Manages batch data processing tasks with dependencies.
Operator

Standardizes orchestration of external resources, enabling programmatic actions from DataOS interfaces.
Stacks

Key Stacks in this category are Flare, DBT, and Data Toolbox

Streaming Data¶

These DataOS Resources are designed for stream data processing, handling real-time data flows and continuous data ingestion.

Workflow

Manages stream data processing tasks by running them as micro-batches.
Service

Represents a long-running process that acts as a receiver and/or provider of APIs.
Worker

Represents a long-running process responsible for performing specific tasks or computations indefinitely.
Stacks

Key Stacks in this category are Flare and Benthos.

Storage¶

This category includes DataOS Resources for providing efficient & scalable data storage.

Volume

Provides persistent shared storage for Pod containers.
Lakehouse

A fully managed storage architecture that blends the strengths of data lakes and data warehouses.
Database

Acts as a repository for storing transaction data, utilizing a managed Postgres relational database.
Stacks

Key Stack in this category is Beacon.

Observability¶

These key DataOS Resources are essential for tracking system performance and managing alerts, providing visibility into the health and status of the data infrastructure.

Monitor

Ensures system reliability and performance through observability and incident management.
Pager

Enables developers to define criteria for identifying incidents from a stream, delivering alerts based on specified conditions.

Security¶

These DataOS Resources ensure data security and access control, managing sensitive information and enforcing policies for data protection.

Instance-Secret

Designed for securely storing sensitive information at the DataOS instance level, reducing exposure risks in application code or manifest files.
Secret

Designed for secure storage of sensitive information like passwords, certificates, tokens, or keys within a DataOS Workspace.
Policy

Defines rules governing user/application behavior, enforced through Attribute-Based Access Control.
Grant

Links the Subject-Predicate-Object relationship to create access policies, granting specific system or data access.

Deployement¶

These DataOS Resources streamline the deployment process of data products, facilitate the packaging, distribution, and execution of applications and services.

Bundle

Standardizes the deployment of multiple Resources, data products, or applications in one operation.
Stacks

Key Stacks in this category are Container and SteamPipe.

Infrastructure Resources¶

This category includes DataOS Resources for managing computational power and infrastructure configurations, essential for running your analytics and data processing workloads. They ensure optimal performance and scalability of data processing environments.

Cluster

Provides the computational resources and configurations for data engineering and analytics tasks.
Compute

Streamlines the allocation of processing power for data tasks, acting as an abstraction over node pools of similarly configured VMs.

Learning Assets¶

The following sections in the documentation help you understand various aspects of DataOS, which are essential for building data products using DataOS.

Video Tutorials

Learn more
Glossary

Learn more
Quick Guides

Learn more