Data Product¶
A Data Product is a self-contained unit within DataOS designed for handling and sharing analytical data, developed and managed by the dedicated teams. It includes meta data, data transformation code, input and output definitions, discovery and observability, APIs, documentation, service level objectives (SLOs), governance, transformation and platform dependencies such as compute and storage resources. Data Product is reusable, composable, portable and cloud-agnostic.
DataOS provides the platform for the development, management, processing, and deployment of Data Products across an organization. It provides a streamlined approach to handling Data Products throughout their entire lifecycle, from ingestion and storage to analysis and delivery. By integrating these functionalities into a single, cohesive system, DataOS enhances decision-making and boosts operational efficiency.
A Data Product is an integrated and self-contained combination of data, metadata, semantics and templates. It includes access and logic-certified implementation for tackling specific data and analytics (D&A) scenarios and reuse. A Data Product must be consumption-ready (trusted by consumers), up to date (by engineering teams) and approved for use (governed). Data Products enable various D&A use cases, such as data sharing, data monetization, analytics and application integration.
- Gartner®
-
How to develop a Data Product?
Learn how to develop and manage a Data Product within DataOS.
-
Learn more about the Data Product
Learn about key facets, characterisitcs, persona, and types of the Data Product.
-
Data Product Examples
Explore examples showcasing how an actual Data Product is developed.
Data Product Architecture¶
The architecture of a Data Product within DataOS involves several components designed to facilitate the data consumption and deliver business value. This section outlines the primary consumption ports of a Data Product and introduces the additional Experience Ports offered by DataOS.
Input Ports¶
Input Ports are responsible for receiving data that will form the core of the Data Product. They specify the format and protocol required to ingest data from operational source systems or other data products. These ports can be one or many, depending on the number of data sources. They specify the data format (e.g., CSV, JSON, Parquet) and protocol (e.g., HTTP, FTP, JDBC) required for data ingestion.
Output Ports¶
Output Ports define how the data is exposed and consumed by external systems or users. They outline the format and consumption protocol for making data available to stakeholders. They specify how data can be queried or accessed (e.g., REST API, SQL query) and may support various formats depending on consumption needs (e.g., JSON, XML).
Control Ports¶
Control Ports are used for monitoring, logging, and managing the Data Product. They also provide metadata and descriptive information about the Data Product. These ports facilitate performance tracking and operational metrics through monitoring and logging. They offer access to metadata such as ownership, organizational unit, licensing, and versioning. Additionally, they provide integration with a data marketplace, offering public and self-description information.
Experience Ports¶
Experience Ports are provided by DataOS to support additional consumption paradigms beyond the standard input, output, and control functionalities. They enable specialized access methods such as BI tools, AI integrations, and data applications. Examples include exposing the Data Product via a REST API using Talos, creating and managing a semantic model with DataOS’s Lens for improved data understanding, and implementing a chat interface using Lens-LLM systems for natural language interactions with the data.
In the following sections, we have outlined the comprehensive thought process involved in developing a Data Product, from defining use cases to the deployment.
Define Usecases¶
The development of a Data Product initialized by defining the use cases, a single data product can cater to multiple use cases and all the way around. Let's take an example, suppose our usecase is to analyze the Website Traffic Source. This analysis provides actionable insights, enabling data-driven decision-making to optimize marketing strategies and improve business outcomes. The intended audience includes data analysts, marketing teams, business stakeholders, and technical teams responsible for data product development. The requirements for this use case include access to data source, an ETL (Extract, Transform, Load) process to clean and transform raw data, a data model to structure the transformed data, and visualization tools to present the analysis results. Additionally, secure data handling and storage must be ensured throughout the process.
Explore and Discover Data Products¶
Once use cases have been defined, the next step is to explore the existing Data Products available in the Data Product Hub. If the available Data Products sufficiently address the use cases, there is no need to develop a new Data Product. However, if the existing Data Products do not meet the requirements of the use cases, we can proceed to the Data Product Development Life Cycle to create a new Data Product.
Data Product Development Life cycle¶
The Data Product Development Life cycle consists of four key phases: Design, Develop, Deploy, and Iterate. It starts with Design, where business goals are translated into a solution architecture. The Develop phase involves building and testing the data product based on this design. Deploy focuses on releasing the product to users and ensuring it operates effectively in a production environment. Finally, Iterate emphasizes continuous improvement through feedback and performance analysis to adapt to evolving needs and enhance the product over time. To know about Data Product Development Life cycle in detail, please refer to this.
Structure of Data Product Manifest¶
A Data Product manifest outlines essential metadata and configuration details about a Data Product. This structure can be modified based on specific requirements and additional metadata needed for the Data Product.
# Product meta section
name: {{dp-test}} # Product name (mandatory)
version: {{v1alpha}} # Manifest version (mandatory)
type: {{data}} # Product-type (mandatory)
tags: # Tags (Optional)
- {{data-product}}
- {{dataos:type:product}}
- {{dataos:product:data}}
description: {{the customer 360 view of the world}} # Descripton of the product (Optional)
Purpose: {{This data product is intended to provide insights into the customer for strategic decisions on cross-selling additional products.}} # purpose (Optional)
collaborators: # collaborators User ID (Optional)
- {{thor}}
- {{blackwidow}}
- {{loki}}
owner: {{iamgroot}} # Owner (Optional)
refs: # Reference (Optional)
- title: {{Bundle Info}} # Reference title (Mandatory if adding reference)
href: {{https://dataos.info/resources/bundle/}} # Reference link (Mandatory if adding reference)
entity: {{product}} # Entity (Mandatory)
# Data Product-specific section (Mandatory)
v1alpha: # Data Product version
data:
resources: # Resource specific section(Mandatory)
- name: {{bundle-dp}} # Resource name (Mandatory)
type: {{bundle}} # Resource type (Mandatory)
version: {{v1beta}} # Resource version (Mandatory)
refType: {{dataos}} # Resource reference type (Mandatory)
workspace: {{public}} # Workspace (Requirement depends on the resource type)
description: {{this bundle resource is for a data product}} # Resource description (Optional)
purpose: {{deployment of data product resources}} # Purpose of the required resource (Optional)
inputs: # Input specific section (Mandatory)
- description: Sales 360
purpose: source
refType: dataos
ref: dataos://bigquery:PUBLIC/MYTABLE
outputs: # Output specific section (Mandatory)
- description: Customer
purpose: consumption
refType: dataos_address
ref: dataos://icebase:sandbox/sales?acl=rw
Configurations¶
Data Product can be configured to make the efficient business decisions based on reliable data. This section provides the detailed breakdown of each attribute, please refer to the documentation: Attributes of Data Product manifest
Recipes¶
This section provides step-by-step guides to assist you in effectively configuring the Data Product to solve common challenges. Below are some recipes to help you configure Data Product effectively:
- How to Create Data Product template using Cookiecutter?
- How to Deploy Data Product using CI/CD pipeline?
Examples¶
This section provides practical, real-world scenarios demonstrating how to effectively develop a Data Product. Below are some examples to help you to understand the Data Product: