Skip to content

Learn

Welcome to the DataOS Learning Hub!

We have designed the DataOS Learning Hub to cater to your specific role and expertise level within the DataOS ecosystem. Our learning tracks are tailored to meet the needs of different personas, ensuring you receive the knowledge and skills required to excel. To further support your journey, we have made quick start guides and instructional videos available.

Learning tracks

Learning tracks are created to meet the unique needs of different user personas. These tracks offer learning paths to help individuals acquire the necessary skills to leverage DataOS capabilities—whether it's creating, managing, or consuming Data Products.

Each learning track is organized into modules, which further break down into topics. These topics combine quick concepts, practical scenarios, code snippets, and visuals to make your learning experience engaging and efficient.

Choose a learning path that suits your role:

  • Data Product Consumer


    Crafted to help you gain a deeper understanding of how to work with Data Products. You'll develop the skills necessary to explore, analyze, and utilize Data Products effectively in your role, whether you're a Data Analyst, Data Scientist, or Business Analyst.

    Learn more

  • Data Product Developer


    Designed to equip you with the skills needed to create, manage, and scale Data Products using DataOS. Whether it’s understanding business requirements or diving into the technical nitty-gritty of data pipelines, access control, quality checks, and more, this track covers all the essentials for your role.

    Learn more

  • DataOS Operator


    Created to empower you with the knowledge and skills necessary to effectively manage the DataOS platform. As a DataOS Operator, you are responsible for overseeing the platform’s infrastructure, compute resources, data security, and compliance.

    Learn more

Data Product Consumer

Data Product Consumers in DataOS encompass a variety of roles, such as Data Analysts, Business Analysts, and Data Scientists. Analysts play essential roles in leveraging data for actionable insights and strategic decision-making. They utilize DataOS to discover, explore, and activate Data Products, enabling them to transform raw data into valuable business intelligence and drive innovation. Data Scientists leverage advanced analytical techniques and machine learning algorithms to extract meaningful insights from data within DataOS.

Key responsibilities

Here are the key responsibilities of a Data Product Consumer, though specific tasks may vary depending on the role or initiative:

  • Discovering and accessing Data Products: Identify and access relevant Data Products based on business needs. Interpret metadata to understand product details and assess the usability of Data Products for informed decision-making.

  • Navigating semantic models: Understand the relationships between data entities within semantic models to improve data comprehension.

  • Checking data quality: Evaluate Data Products for accuracy, consistency, and completeness, ensuring high-quality analysis and decision-making.

  • Understanding governance and policies: Ensure data usage and access aligns with organizational security standards and regulations.

  • Activating Data Products: Consider how Data Products can be consumed with Business Intelligence (BI) tools, APIs, and other applications to enhance workflows and reporting.

  • Tracking metrics and performance: Monitor performance, usage, and impact metrics of Data Products to assess their effectiveness and communicate results to stakeholders.

Modules overview

In this learning track, you will get a comprehensive introduction to Data Products, covering their types and importance in driving insights. You'll learn to navigate the Data Product Hub (DPH), access essential data product information, analyze input/output for meaningful insights, explore semantic models, assess data quality, and understand governance policies for data security.

infographics
Click here for details on the Data Product Consumer learning track modules.
No. Module Description Key Topics
1 Understanding Data Products Get a solid foundation on what Data Products are and how they can drive insights and decision-making. Learn about their features, and importance in business processes.
  • Introduction to Data Products : Understand how Data Products transform raw data into valuable insights, enabling data-driven decisions.
  • Features and Importance of the Data Product: Learn key features that make Data Products indispensable for data consumers—scalability, real-time access, usability.
2 Discovering Data Products on DPH Learn how to navigate the Data Product Hub (DPH) to find Data Products that meet your needs using search, filters, tags, and categories.
  • Introduction to Data Product Hub: Learn to navigate the Data Product Hub and get to know about Perspectives and Metrics.
  • Discover Data Products of Interest: Learn how to identify the most relevant Data Product tailored to solving your specific use case efficiently.
3 Viewing Data Product Info Access key details of the data product—contributors, tier, type, and tags, along with links to relevant Git repository and Jira for easy reference and collaboration to make informed decisions on data product usage.
  • Get the details of the Data Product of Interest: Examine key details of the Data Product to evaluate its suitability for your use case.
4 Exploring Input and Output Data Explore the input and output datasets that are either fed into or generated by the data product for consumption.
  • Know about Input and Output datasets: Understand the schemas of the input and output datasets. Use Metis to access detailed metadata and Workbench for advanced data exploration and querying.
5 Navigating Semantic Models Explore semantic models to understand relationships between data entities and improve data integration and comprehension.
  • Exploring Semantic Models: Visualize how data flows from input datasets to create meaningful metrics. Understand the data flow, relationships, and transformations that drive insights.
6 Checking Data Quality Learn how to assess data quality through key factors like accuracy, consistency, and timeliness to ensure reliable analysis.
  • Understanding the Quality Checks: View the quality checks applied to ensure the Data Product meets data standards.
7 Managing Data Governance Understand governance policies, and compliance standards implemented with Data Products to ensure data security and integrity.
  • Understanding Access Policy: Learn about the access policies implemented for the Data Product to manage user permissions and control access.
8 Integrating Data Products with BI Tools and Applications Unlock the power of Data Products by connecting them to BI tools. Learn to use the data product in Jupyter Notebooks for AI/ML development, query data via Postgres or GraphQL, and easily integrate with your apps using flexible APIs.
  • Integration with BI tools: Connect Data Products with BI tools for visualization and reporting.
  • Integration with AI and ML: Explore strategies for integrating Data Products with AI and machine learning frameworks.
  • Integration with Postgres: Learn methods for connecting Data Products with Postgres databases.
  • Integration with GraphQL: Use GraphQL for querying.
  • Integration with Data API: Use Data APIs for programmatic access to Data Products.

Start learning: Click here to access the modules.

Data Product Developer

Data Product Developers play a key role in creating, managing, and evolving Data Products within DataOS. They are responsible for building the data infrastructure that powers everything from analytics to business intelligence, making sure data flows smoothly through pipelines and stays accurate and accessible for users. Plus, they ensure those Data Products deliver reliable insights while staying in line with governance policies.

Key responsibilities

Here are the key responsibilities of a Data Product Developer, though specific tasks may differ based on the role or objective:

  • Collaborate with stakeholders: Collaborate with stakeholders to gather requirements, align Data Products with business objectives.,

  • Design Data Products: Design semantic models, define quality and security standards, and determine how users will consume the data product.

  • Data Pipeline Management: Create data pipelines, implement data transformations to efficiently handle data ingestion.

  • Quality Assurance: Ensure data integrity through quality checks and monitoring.

  • Data Governance and Security: Apply appropriate data security, access controls, ensuring regulatory compliance.

  • Deployement and Maintainance: Deploy Data Products efficiently, monitor their performance, and manage updates using CI/CD practices.

Modules overview

The learning track for Data Product Developers is divided into modules, each focusing on essential stages of the data product lifecycle. Every module covers key topics that provide step-by-step guidance using hands-on examples and best practices ensuring a comprehensive and practical learning experience.

infographics

Detailed module breakdown

Click here for details on the Data Product Developer learning track modules.
No Modules Description Topics
1 Understanding Data Needs In this module, the focus is on grasping the business requirements that will guide the creation of the data product. Key activities include:
  • Understanding business goals: Align Data Products with overall business objectives.
  • Quality & security expectations: Identify quality standards and security protocols.
  • Collaboration with stakeholders: Work closely with business and technical teams to define needs.
  • Understanding consumption: Recognize how the data product will be consumed by end-users.
2 Designing Data Products This module dives into the design phase using DataOS Metis and Workbench tools.
  • Using DataOS Metis: Navigate Metis to explore data assets, understand data format and structure, and assess data quality while tracing its lineage.
  • Using Workbench: Conduct exploratory data analysis (EDA) to refine the data model.
  • Resource Identification: Identify the necessary DataOS Resources you require to build the product.
  • Security and Sensitivity: Identify sensitive data and establish relevant data policies.
  • Defining Quality and Service Level Objectives (SLOs): Set performance benchmarks and define consumption methods for the data product.
  • Defining Consumption Methods: How the users will consume the data product.
3 Building Data Products This module covers the technical aspects of constructing the data product.
  • Creating Depots: Set up depots for source and destination systems within DataOS.
  • Building Data Pipelines: Understand stream and batch data processing methods, write data transformations, and get introduced to data processing stacks.
  • Creating Lens Models: Develop logical data models that structure the data product.
  • Quality Checks: Implement quality checks to maintain data integrity.
  • Monitoring & Alerting: Set up monitoring and notification systems for ongoing oversight.
  • Data APIs: Create APIs to expose data for consumption by other systems.
  • Applying Access Control: Implement data policies that govern access and security.
4 Deploying Data Products The final module focuses on deploying the data product within DataOS.
  • Bundle Deployment: Use the DataOS CLI to create and apply deployment bundles.
  • Creating a Data Product Manifest File: Configure and apply the data product manifest file for deployment.
  • Performing Metadata Scans: Create a Scanner Workflow to provide visibility into metadata.
  • Validating the Data Product: Use CLI commands to validate the creation and configuration of the data product.
  • CI/CD: Implement continuous integration and deployment practices to streamline future updates.

Start learning: Click here to access the modules.

DataOS Operator

A DataOS Operator is the administrator responsible for managing and maintaining the DataOS platform. This role involves overseeing the system’s performance, ensuring the secure management of resources, and guaranteeing compliance with regulatory standards. The operator is the key figure who ensures the platform’s day-to-day operations run smoothly, providing a stable environment for all teams interacting with DataOS.

The DataOS Operator handles a range of tasks, from provisioning compute resources to managing access controls and system security. They are also responsible for monitoring system health, ensuring interoperability with external systems, and scaling the platform to meet growing demands. In essence, the DataOS Operator ensures the platform’s integrity and performance, allowing teams to leverage data efficiently while safeguarding critical assets.

Key responsibilities

A DataOS Operator could be an existing Forward Deployment Engineer, DevOps Engineer, or a Cloud Engineer. Here are the key responsibilities of a DataOS Operator:

  • Kubernetes cluster management: Oversee and manage Kubernetes clusters to ensure the optimal performance of the DataOS platform.

  • Cloud infrastructure management: Handle deployments and resource management on cloud platforms like AWS, GCP, or Azure.

  • System monitoring: Use tools like Prometheus and Grafana to monitor system health, track performance metrics, and resolve issues proactively.

  • Access control management: Manage authentication and authorization mechanisms to enforce data governance and ensure appropriate access to resources.

  • Container management: Manage Docker containers to ensure smooth operation within DataOS' containerized environment.

  • Minerva cluster management: Optimize and manage Minerva Clusters to handle query processing and ensure efficient resource use.

  • Credential and secret management: Securely manage sensitive information, including credentials and secrets, to maintain system integrity.

  • Compute resource provisioning and scaling: Provision and scale compute instances based on the platform’s needs, ensuring sufficient resources for workflows, jobs, and queries.

  • Regulatory compliance: Ensure that all platform operations comply with relevant regulatory standards for security and data management.

  • System security: Maintain the security of the DataOS platform, implementing best practices for resource and data protection.

Modules overview

The learning track is divided into modules, with each module focusing on key operational areas. Every module contains specific topics that address common challenges you will encounter as a DataOS Operator and guide you through the core aspects of this role with the tools to troubleshoot efficiently.

infographics

Detailed module breakdown

Click here for details on the DataOS Operator learning track modules.
No Modules Description Topics
1 Compute management Learn how to manage compute resources effectively to ensure smooth operation of workflows, jobs, services, and querying processes within DataOS.
  • Managing compute resources to avoid workflow/job failures: Manage and scale compute resources to ensure that workflows, jobs, and services do not fail due to insufficient compute availability.
  • Provisioning Minerva clusters for querying: Learn how to troubleshoot issues related to provisioning Minerva Clusters for querying tasks. This includes expanding compute resources or reallocating resources to ensure that queries can run smoothly.
2 Query cluster management Understand how to optimize and manage query clusters to provide seamless data access and performance.
  • Optimizing query clusters for better performance: Identify and resolve issues related to underperforming query clusters, including resizing and reconfiguring clusters for optimal performance.
  • Scheduling query clusters using cron jobs: Learn how to schedule query clusters using cron jobs, ensuring that they are available at specific times for batch processes or other scheduled tasks.
3 Credential security Safeguard sensitive information by managing credentials securely within the DataOS platform.
  • Preventing credential exposure in code: Know about best practices for managing and securing credentials to prevent accidental exposure in code. You will learn about secure storage techniques and tools for credential management.
4 Data source connectivity Learn how to establish secure and stable connections to data sources while adhering to best practices for security and performance.
  • Securing data source connections: Learn to set up secure connections to various data sources, including encrypting credentials and following security best practices to protect data access.
5 Access management Ensure appropriate access control by managing user permissions and roles within the DataOS platform.
  • Granting appropriate user access: Understand the process of evaluating and granting user access requests, ensuring that permissions are appropriately allocated according to the principle of least privilege.
6 System monitoring Proactively monitor the platform using system metrics to ensure optimal performance and resolve issues before they affect operations.
  • Monitoring system metrics for proactive issue resolution: Learn how to use monitoring tools like Prometheus and Grafana to track key system metrics and proactively manage resource usage. This topic helps you catch issues early and maintain platform performance.
7 Interoperability with external platforms Ensure smooth interoperability between DataOS and external platforms by managing integrations and connections securely.
  • Managing interoperability with external platforms: Focus on setting up and maintaining secure and stable connections with external platforms, ensuring that DataOS integrates seamlessly with third-party systems.
8 Stack provisioning Scale the DataOS platform by provisioning additional stacks to meet increasing resource demands.
  • Provisioning new stacks for resource scalability: Learn how to provision additional compute, storage, and networking stacks to ensure that the platform can handle growing workloads and future demands.
9 Compliance and governance Ensure that the DataOS platform adheres to global data governance standards and regulatory requirements.
  • Maintaining compliance with data governance regulations: Understand how to maintain compliance with regulations such as GDPR, CCPA, and HIPAA, ensuring that DataOS meets all legal and governance requirements.

Quick start guides

Looking for a fast way to get up and running? Our Quick Start Guides provide step-by-step instructions for performing key tasks and operations within DataOS. Perfect for getting things done quickly!

Videos

Explore our Video Library to watch tutorials that cover various topics from the basics to advanced features of DataOS.