Skip to content

DataOS Documentation

In the rapidly evolving world of data, the need for robust, scalable, and efficient data products has never been more critical. DataOS is a platform that transforms how businesses create, manage, and leverage data products. DataOS empowers business teams to innovate by providing the tools and autonomy for data product development.

DataOS is an enterprise-grade data product platform that enables organizations to build, manage, and share data products. It provides the essential building blocks, data developers require to create powerful data products that drive significant business outcomes.

DataOS development is driven by core principles tailored to address specific user needs and challenges:

  • Consumption Ready Layer: DataOS streamlines data product consumption with context-aware discovery, secure exploration, reliable quality, and multi-interface activation through its self-serve architecture.

  • Data Product Lifecycle Management: DataOS is built to serve data product consumers, including data analysts/scientists and data app developers. It aims to capture the entire data product lifecycle, integrating data product consumers, owners, developers, and administrators under one roof. This holistic approach ensures seamless integration with tools that users are already familiar with.

  • Faster Time to Value: DataOS accelerates the development process, enabling enterprises to gain insights quickly. This reduces the time to value substantially, fueling targeted campaigns, personalizing customer journeys, and enhancing profits faster.

  • AI Ready: Leveraging AI agents, DataOS enhances the user experience by providing heuristic assistance that evolves based on user feedback and needs.

  • FinOps: DataOS provides real-time insights into resource utilization, enabling organizations to monitor and control cloud spending effectively. This strategic approach promotes shared responsibility across teams, drives significant cost savings, improves operational efficiency, and facilitates informed financial decisions.

DataOS continuously evolves to meet the real-world needs of data professionals. It significantly lowers the total cost of ownership by streamlining data operations through task automation, minimizing data movement, and simplifying maintenance.

If you are new to DataOS, dataos.info is your go-to place for getting started with key technical concepts. Understanding these will help you develop data products efficiently with DataOS. For experienced developers looking to build solutions, how-to guides and reference docs are available to get you up to speed with data product creation. You'll find everything you need right here, whether starting or diving deep.

The documentation website features the top menu bar with the options like Getting Started, Data Products, Glossary, and Videos. A multi-level index is displayed on the left menu, allowing users to easily explore and dive deeper into specific topics within each category.

First Steps

The following sections in the top menu bar of the documentation will help you get started with DataOS.

  • Getting Started


    Get hands-on and up to speed with DataOS, and familiarize yourself with its capabilities.

    Learn more

  • Data Product


    Learn to create, deploy & manage domain-specific data products at scale.

    Explore more

Core Aspects

The following sections of the documentation, located on the left menu, provide detailed insights into the DataOS philosophy and its architecture.

  • Philosophy


    Understand the philosophy behind DataOS, designed to simplify and abstract the complexities of traditional data infrastructure.

    Read more

  • Architecture


    Learn about the architecture of DataOS, built to democratize data, and accelerate data product creation.

    See more

Understand Interfaces

The Interfaces section in the documentation introduces various ways to interact with DataOS services and components.

  • Command Line Interface (CLI)


    The DataOS CLI enables efficient data operations, providing quick, flexible access to system functions for data engineers and administrators.

    Learn more

  • Graphical User Interface (GUI)


    The Graphical User Interface provides an intuitive and visually appealing method for interacting with DataOS and its components.

    Learn more

  • Application Programming Interface (API & SDK)


    The Application Programming Interface in DataOS enables seamless interaction with its core components and libraries, enabling the creation of diverse applications and services.

    Learn more

Understand DataOS Resources

Resources section of the documentation will help you understand the primitives of DataOS that power the core functionalities necessary in any data stack. We have mapped these DataOS Resources to their functional role in the data stack:

Source Connectivity and Metadata management

This category includes DataOS Resources that facilitate the connection to various data sources, scan the metadata, and run quality checks & data profiling.

  • Depot


    Connects various data sources to DataOS, abstracting underlying complexities.

  • Stacks


    Acts as an execution engine and integrates new programming paradigms. Key Stacks in this category are Soda and Scanner.

Data Movement and Processing

This category includes DataOS Resources that facilitate the movement and transformation of data.

Batch Data

These DataOS Resources support batch data processing, enabling scheduled, large-scale data transformations and movements.

  • Workflow


    Manages batch data processing tasks with dependencies.

  • Operator


    Standardizes orchestration of external resources, enabling programmatic actions from DataOS interfaces.

  • Stacks


    Key Stacks in this category are Flare, DBT, and Data Toolbox

Streaming Data

These DataOS Resources are designed for stream data processing, handling real-time data flows and continuous data ingestion.

  • Workflow


    Manages stream data processing tasks by running them as micro-batches.

  • Service


    Represents a long-running process that acts as a receiver and/or provider of APIs.

  • Worker


    Represents a long-running process responsible for performing specific tasks or computations indefinitely.

  • Stacks


    Key Stacks in this category are Flare and Benthos.

Storage

This category includes DataOS Resources for providing efficient & scalable data storage.

  • Created by Kiki Rizkyfrom Noun Project Volume


    Provides persistent shared storage for Pod containers.

  • Lakehouse


    A fully managed storage architecture that blends the strengths of data lakes and data warehouses.

  • Database


    Acts as a repository for storing transaction data, utilizing a managed Postgres relational database.

  • Stacks


    Key Stack in this category is Beacon.

Observability

These key DataOS Resources are essential for tracking system performance and managing alerts, providing visibility into the health and status of the data infrastructure.

  • Monitor


    Ensures system reliability and performance through observability and incident management.

  • Pager


    Enables developers to define criteria for identifying incidents from a stream, delivering alerts based on specified conditions.

Security

These DataOS Resources ensure data security and access control, managing sensitive information and enforcing policies for data protection.

  • Instance-Secret


    Designed for securely storing sensitive information at the DataOS instance level, reducing exposure risks in application code or manifest files.

  • Secret


    Designed for secure storage of sensitive information like passwords, certificates, tokens, or keys within a DataOS Workspace.

  • Policy


    Defines rules governing user/application behavior, enforced through Attribute-Based Access Control.

  • Grant


    Links the Subject-Predicate-Object relationship to create access policies, granting specific system or data access.

Deployement

These DataOS Resources streamline the deployment process of data products, facilitate the packaging, distribution, and execution of applications and services.

  • Bundle


    Standardizes the deployment of multiple Resources, data products, or applications in one operation.

  • Stacks


    Key Stacks in this category are Container and SteamPipe.

Infrastructure Resources

This category includes DataOS Resources for managing computational power and infrastructure configurations, essential for running your analytics and data processing workloads. They ensure optimal performance and scalability of data processing environments.

  • Cluster


    Provides the computational resources and configurations for data engineering and analytics tasks.

  • Compute


    Streamlines the allocation of processing power for data tasks, acting as an abstraction over node pools of similarly configured VMs.

Learning Assets

The following sections in the documentation help you understand various aspects of DataOS, which are essential for building data products using DataOS.