DataOS Documentation¶
In the rapidly evolving world of data, the need for robust, scalable, and efficient data products has never been more critical. DataOS is a platform that transforms how businesses create, manage, and leverage data products. DataOS empowers business teams to innovate by providing the tools and autonomy for data product development.
DataOS is an enterprise-grade data product platform that enables organizations to build, manage, and share data products. It provides the essential building blocks, data developers require to create powerful data products that drive significant business outcomes.
DataOS development is driven by core principles tailored to address specific user needs and challenges:
-
Consumption Ready Layer: DataOS streamlines data product consumption with context-aware discovery, secure exploration, reliable quality, and multi-interface activation through its self-serve architecture.
-
Data Product Lifecycle Management: DataOS is built to serve data product consumers, including data analysts/scientists and data app developers. It aims to capture the entire data product lifecycle, integrating data product consumers, owners, developers, and administrators under one roof. This holistic approach ensures seamless integration with tools that users are already familiar with.
-
Faster Time to Value: DataOS accelerates the development process, enabling enterprises to gain insights quickly. This reduces the time to value substantially, fueling targeted campaigns, personalizing customer journeys, and enhancing profits faster.
-
AI Ready: Leveraging AI agents, DataOS enhances the user experience by providing heuristic assistance that evolves based on user feedback and needs.
-
FinOps: DataOS provides real-time insights into resource utilization, enabling organizations to monitor and control cloud spending effectively. This strategic approach promotes shared responsibility across teams, drives significant cost savings, improves operational efficiency, and facilitates informed financial decisions.
DataOS continuously evolves to meet the real-world needs of data professionals. It significantly lowers the total cost of ownership by streamlining data operations through task automation, minimizing data movement, and simplifying maintenance.
Navigating Documentation¶
If you are new to DataOS, dataos.info is your go-to place for getting started with key technical concepts. Understanding these will help you develop data products efficiently with DataOS. For experienced developers looking to build solutions, how-to guides and reference docs are available to get you up to speed with data product creation. You'll find everything you need right here, whether starting or diving deep.
The documentation website features the top menu bar with the options like Getting Started, Data Products, Glossary, and Videos. A multi-level index is displayed on the left menu, allowing users to easily explore and dive deeper into specific topics within each category.
First Steps¶
The following sections in the top menu bar of the documentation will help you get started with DataOS.
-
Getting Started
Get hands-on and up to speed with DataOS, and familiarize yourself with its capabilities.
-
Data Product
Learn to create, deploy & manage domain-specific data products at scale.
Core Aspects¶
The following sections of the documentation, located on the left menu, provide detailed insights into the DataOS philosophy and its architecture.
Understand Interfaces¶
The Interfaces section in the documentation introduces various ways to interact with DataOS services and components.
-
Command Line Interface (CLI)
The DataOS CLI enables efficient data operations, providing quick, flexible access to system functions for data engineers and administrators.
-
Graphical User Interface (GUI)
The Graphical User Interface provides an intuitive and visually appealing method for interacting with DataOS and its components.
-
Application Programming Interface (API & SDK)
The Application Programming Interface in DataOS enables seamless interaction with its core components and libraries, enabling the creation of diverse applications and services.
Understand DataOS Resources¶
Resources section of the documentation will help you understand the primitives of DataOS that power the core functionalities necessary in any data stack. We have mapped these DataOS Resources to their functional role in the data stack:
Source Connectivity and Metadata management¶
This category includes DataOS Resources that facilitate the connection to various data sources, scan the metadata, and run quality checks & data profiling.
Data Movement and Processing¶
This category includes DataOS Resources that facilitate the movement and transformation of data.
Batch Data¶
These DataOS Resources support batch data processing, enabling scheduled, large-scale data transformations and movements.
Streaming Data¶
These DataOS Resources are designed for stream data processing, handling real-time data flows and continuous data ingestion.
-
Manages stream data processing tasks by running them as micro-batches.
-
Represents a long-running process that acts as a receiver and/or provider of APIs.
-
Represents a long-running process responsible for performing specific tasks or computations indefinitely.
-
Storage¶
This category includes DataOS Resources for providing efficient & scalable data storage.
-
Provides persistent shared storage for Pod containers.
-
A fully managed storage architecture that blends the strengths of data lakes and data warehouses.
-
Acts as a repository for storing transaction data, utilizing a managed Postgres relational database.
-
Key Stack in this category is Beacon.
Observability¶
These key DataOS Resources are essential for tracking system performance and managing alerts, providing visibility into the health and status of the data infrastructure.
Security¶
These DataOS Resources ensure data security and access control, managing sensitive information and enforcing policies for data protection.
-
Designed for securely storing sensitive information at the DataOS instance level, reducing exposure risks in application code or manifest files.
-
Designed for secure storage of sensitive information like passwords, certificates, tokens, or keys within a DataOS Workspace.
-
Defines rules governing user/application behavior, enforced through Attribute-Based Access Control.
-
Links the Subject-Predicate-Object relationship to create access policies, granting specific system or data access.
deployment¶
These DataOS Resources streamline the deployment process of data products, facilitate the packaging, distribution, and execution of applications and services.
Infrastructure Resources¶
This category includes DataOS Resources for managing computational power and infrastructure configurations, essential for running your analytics and data processing workloads. They ensure optimal performance and scalability of data processing environments.
Learning Tracks¶
Explore role-based learning tracks to master essential DataOS capabilities to create, manage and consume Data Products. These paths enable you to focus on the training and knowledge areas most relevant to your specific role. With practical insights, and step-by-step guidance, these learning tracks streamline your journey, empowering you to unlock the full potential of Data Products.
-
Data Product Consumer
-
Data Product Developer
-
DataOS Operator
Learning Assets¶
To further support your journey, the following sections in the documentation help you understand various aspects of DataOS, which are essential for building Data Products using DataOS.
-
Glossary
-
Quick Start Guides
-
Video Tutorials