Skip to content

Learn

Welcome to the DataOS Learning Hub!

We have designed the DataOS Learning Hub to cater to your specific role and expertise level within the DataOS ecosystem. Our learning tracks are tailored to meet the needs of different personas, ensuring you receive the knowledge and skills required to excel. To further support your journey, we have made quick start guides and instructional videos available.

Learning tracks

Learning tracks are created to meet the unique needs of different user personas. These tracks offer learning paths to help individuals acquire the necessary skills to leverage DataOS capabilities—whether it's creating, managing, or consuming Data Products.

Each learning track is organized into modules, which further break down into topics. These topics combine quick concepts, practical scenarios, code snippets, and visuals to make your learning experience engaging and efficient.

Choose a learning path that suits your role:

  • Data Product Consumer


    Crafted to help you gain a deeper understanding of how to work with Data Products. You'll develop the skills necessary to explore, analyze, and consume Data Products effectively in your role, whether you're a Data Analyst, Data Scientist, Business Analyst, App Developer, Product Manager.

    Learn more

  • Data Product Developer


    Designed to equip you with the skills to create, manage, and scale Data Products using DataOS, tailored for roles like Data Engineers, AI/ML Engineers, Data Modelers, and Solution Architects. From translating business needs into solutions to building pipelines, enforcing access controls, ensuring data quality, and designing scalable architectures, it covers all essentials to excel in your role.

    Learn more

  • DataOS Operator


    Created to empower roles like DataOS Operators, DevOps Engineers, Cloud Administrators, and Security Specialists with the knowledge and skills to effectively manage the DataOS platform. Responsibilities include overseeing infrastructure, managing compute resources, ensuring data security, and maintaining compliance.

    Learn more

Data Product Consumer

Data Product Consumers in DataOS encompass a variety of roles, such as Data Analysts, Business Analysts, Data Scientists, App Developers, Product Managers, and AI Product Managers, etc. Analysts play essential roles in leveraging data for actionable insights and strategic decision-making. They utilize DataOS to discover, explore, and activate Data Products for valuable business intelligence and drive innovation. Data Scientists leverage advanced analytical techniques and machine learning algorithms to extract meaningful insights from data within DataOS.

App Developers consume Data Products to build innovative applications that enhance user experiences and expand business capabilities. Product Managers essentially bridge the gap between data capabilities and business outcomes, ensuring that Data Products serve both technical and strategic goals.

AI Product Managers unlock the potential of Data Products by leveraging large language models (LLMs) and Natural Language Processing (NLP) interfaces for insights and data-driven outcomes.

Key responsibilities

Here are the key responsibilities of a Data Product Consumer, though specific tasks may vary depending on the role or initiative:

  • Discovering and accessing Data Products: Identify and access relevant Data Products based on business needs. Interpret metadata to understand product details and assess the usability of Data Products for informed decision-making.

  • Navigating semantic models: Understand the relationships between data entities within semantic models to improve data comprehension.

  • Checking data quality: Evaluate Data Products for accuracy, consistency, and completeness, ensuring high-quality analysis and decision-making.

  • Understanding governance and policies: Ensure data usage and access aligns with organizational security standards and regulations.

  • Activating Data Products: Consider how Data Products can be consumed with Business Intelligence (BI) tools, APIs, and other applications to enhance workflows and reporting. Leverage designated endpoints or interfaces for efficient and secure data access.

  • Tracking metrics and performance: Monitor performance, usage, and impact metrics of Data Products to assess their effectiveness and communicate results to stakeholders.

Core modules

In this learning track, you will get a comprehensive introduction to Data Products, covering their types and importance in driving insights. You'll learn to navigate the Data Product Hub (DPH), access essential data product information, analyze input/output for meaningful insights, explore semantic models, assess data quality, and understand governance policies for data security.

infographics
Click here for details on the Data Product Consumer learning track modules.
No. Module Description Key Topics
1 Understanding Data Products Get a solid foundation on what Data Products are and how they can drive insights and decision-making. Learn about their features, and importance in business processes.
  • Introduction to Data Products : Understand how Data Products transform raw data into valuable insights, enabling data-driven decisions.
  • Features and Importance of the Data Product: Learn key features that make Data Products indispensable for data consumers—scalability, real-time access, usability.
2 Discovering Data Products Learn how to navigate the Data Product Hub (DPH) to find Data Products that meet your needs using search, filters, tags, and categories.
  • Introduction to Data Product Hub: Learn to navigate the Data Product Hub and get to know about Perspectives and Metrics.
  • Discover Data Products of Interest: Learn how to identify the most relevant Data Product tailored to solving your specific use case efficiently.
3 Viewing Data Product Info Access key details of the data product—contributors, tier, type, and tags, along with links to relevant Git repository and Jira for easy reference and collaboration to make informed decisions on data product usage.
  • Get the details of the Data Product of Interest: Examine key details of the Data Product to evaluate its suitability for your use case.
4 Exploring Input and Output Data Explore the input and output datasets that are either fed into or generated by the data product for consumption.
  • Know about Input and Output datasets: Understand the schemas of the input and output datasets. Use Metis to access detailed metadata and Workbench for advanced data exploration and querying.
5 Navigating Semantic Models Explore semantic models to understand relationships between data entities and improve data integration and comprehension.
  • Exploring Semantic Models: Visualize how data flows from input datasets to create meaningful metrics. Understand the data flow, relationships, and transformations that drive insights.
6 Checking Data Quality Learn how to assess data quality through key factors like accuracy, consistency, and timeliness to ensure reliable analysis.
  • Understanding the Quality Checks: View the quality checks applied to ensure the Data Product meets data standards.
7 Managing Data Governance Understand governance policies, and compliance standards implemented with Data Products to ensure data security and integrity.
  • Understanding Access Policy: Learn about the access policies implemented for the Data Product to manage user permissions and control access.
8 Integrating Data Products with BI Tools and Applications Unlock the power of Data Products by connecting them to BI tools. Learn to use the data product in Jupyter Notebooks for AI/ML development, query data via Postgres or GraphQL, and easily integrate with your apps using flexible APIs.
  • Integration with BI tools: Connect Data Products with BI tools for visualization and reporting.
  • Integration with AI and ML: Explore strategies for integrating Data Products with AI and machine learning frameworks.
  • Integration with Postgres: Learn methods for connecting Data Products with Postgres databases.
  • Integration with GraphQL: Use GraphQL for querying.
  • Integration with Data API: Use Data APIs for programmatic access to Data Products.

Start learning: Click here to access the modules.

Data Product Developer

Data Product Developers play a key role in creating, managing, and evolving Data Products within DataOS. They are responsible for building the data infrastructure that powers everything from analytics to business intelligence, making sure data flows smoothly through pipelines and stays accurate and accessible for users. Plus, they ensure those Data Products deliver reliable insights while staying in line with governance policies.

Key responsibilities

Here are the key responsibilities of a Data Product Developer, though specific tasks may differ based on the role or objective:

  • Collaborate with stakeholders: Collaborate with stakeholders to gather requirements, align Data Products with business objectives.,

  • Design Data Products: Design semantic models, define quality and security standards, and determine how users will consume the data product.

  • Data Pipeline Management: Create data pipelines, implement data transformations to efficiently handle data ingestion.

  • Quality Assurance: Ensure data integrity through quality checks and monitoring.

  • Data Governance and Security: Apply appropriate data security, access controls, ensuring regulatory compliance.

  • Deployement and Maintainance: Deploy Data Products efficiently, monitor their performance, and manage updates using CI/CD practices.

Core modules

The learning track for Data Product Developers is divided into modules, each focusing on essential stages of the data product lifecycle. Every module covers key topics that provide step-by-step guidance using hands-on examples and best practices ensuring a comprehensive and practical learning experience.

infographics

Detailed module breakdown

Click here for details on the Data Product Developer learning track modules.
No Modules Description Topics
1 Understanding Data Needs In this module, the focus is on grasping the business requirements that will guide the creation of the data product. Key activities include:
  • Understanding business goals: Align Data Products with overall business objectives.
  • Quality & security expectations: Identify quality standards and security protocols.
  • Collaboration with stakeholders: Work closely with business and technical teams to define needs.
  • Understanding consumption: Recognize how the data product will be consumed by end-users.
2 Designing Data Products This module dives into the design phase using DataOS Metis and Workbench tools.
  • Using DataOS Metis: Navigate Metis to explore data assets, understand data format and structure, and assess data quality while tracing its lineage.
  • Using Workbench: Conduct exploratory data analysis (EDA) to refine the data model.
  • Resource Identification: Identify the necessary DataOS Resources you require to build the product.
  • Security and Sensitivity: Identify sensitive data and establish relevant data policies.
  • Defining Quality and Service Level Objectives (SLOs): Set performance benchmarks and define consumption methods for the data product.
  • Defining Consumption Methods: How the users will consume the data product.
3 Building Data Products This module covers the technical aspects of constructing the data product.
  • Creating Depots: Set up depots for source and destination systems within DataOS.
  • Building Data Pipelines: Understand stream and batch data processing methods, write data transformations, and get introduced to data processing stacks.
  • Creating Lens Models: Develop logical data models that structure the data product.
  • Quality Checks: Implement quality checks to maintain data integrity.
  • Monitoring & Alerting: Set up monitoring and notification systems for ongoing oversight.
  • Data APIs: Create APIs to expose data for consumption by other systems.
  • Applying Access Control: Implement data policies that govern access and security.
4 Deploying Data Products The final module focuses on deploying the data product within DataOS.
  • Bundle Deployment: Use the DataOS CLI to create and apply deployment bundles.
  • Creating a Data Product Manifest File: Configure and apply the data product manifest file for deployment.
  • Performing Metadata Scans: Create a Scanner Workflow to provide visibility into metadata.
  • Validating the Data Product: Use CLI commands to validate the creation and configuration of the data product.
  • CI/CD: Implement continuous integration and deployment practices to streamline future updates.

Start learning: Click here to access the modules.

DataOS Operator

A DataOS Operator is the administrator responsible for managing and maintaining the DataOS platform. This role involves overseeing the system’s performance, ensuring the secure management of resources, and guaranteeing compliance with regulatory standards. The operator is the key figure who ensures the platform’s day-to-day operations run smoothly, providing a stable environment for all teams interacting with DataOS.

The DataOS Operator handles a range of tasks, from provisioning compute resources to managing access controls and system security. They are also responsible for monitoring system health, ensuring interoperability with external systems, and scaling the platform to meet growing demands. In essence, the DataOS Operator ensures the platform’s integrity and performance, allowing teams to leverage data efficiently while safeguarding critical assets.

Key responsibilities

A DataOS Operator could be an existing Forward Deployment Engineer, DevOps Engineer, or a Cloud Engineer. Here are the key responsibilities of a DataOS Operator:

  • Kubernetes cluster management: Oversee and manage Kubernetes clusters to ensure the optimal performance of the DataOS platform.

  • Cloud infrastructure management: Handle deployments and resource management on cloud platforms like AWS, GCP, or Azure.

  • System monitoring: Use tools like Prometheus and Grafana to monitor system health, track performance metrics, and resolve issues proactively.

  • Access control management: Manage authentication and authorization mechanisms to enforce data governance and ensure appropriate access to resources.

  • Container management: Manage Docker containers to ensure smooth operation within DataOS' containerized environment.

  • Minerva cluster management: Optimize and manage Minerva Clusters to handle query processing and ensure efficient resource use.

  • Credential and secret management: Securely manage sensitive information, including credentials and secrets, to maintain system integrity.

  • Compute resource provisioning and scaling: Provision and scale compute instances based on the platform’s needs, ensuring sufficient resources for workflows, jobs, and queries.

  • Regulatory compliance: Ensure that all platform operations comply with relevant regulatory standards for security and data management.

  • System security: Maintain the security of the DataOS platform, implementing best practices for resource and data protection.

Core modules

The learning track is divided into modules, with each module focusing on key operational areas. Every module contains specific topics that address common challenges you will encounter as a DataOS Operator and guide you through the core aspects of this role with the tools to troubleshoot efficiently.

infographics

Detailed module breakdown

Click here for details on the DataOS Operator learning track modules.
No Modules Description Topics
1 Credential security Safeguard sensitive information by managing credentials securely within the DataOS platform.
  • Preventing credential exposure in code: Know about best practices for managing and securing credentials to prevent accidental exposure in code. You will learn about secure storage techniques and tools for credential management.
2 Data source connectivity Learn how to establish secure and stable connections to data sources while adhering to best practices for security and performance.
  • Securing data source connections: Learn to set up secure connections to various data sources, including encrypting credentials and following security best practices to protect data access.
3 Routine Checks Learn how to ensure the reliable and efficient operation of the DataOS platform. Discover the importance of routine system health checks, configuration verification, and proactive issue detection.
  • Performing routine system health checks: Learn how to monitor the health of the platform regularly to prevent downtime.
  • Verifying configurations: Understand the significance of periodic configuration audits to maintain system integrity and efficiency.
  • Proactively detecting issues: Discover tools and techniques to identify potential problems early and address them before they escalate.
4 DataOS Upgrade and Rollback Strategies Master the essentials of managing platform upgrades with confidence. Learn to plan downtime, implement rollback strategies, and apply proactive measures like hotfixes to ensure seamless performance.
  • Planning downtime for upgrades: Learn to effectively plan and communicate platform downtime to minimize disruption.
  • Implementing rollback strategies: Understand how to quickly revert changes when issues arise post-upgrade.
  • Applying hotfixes proactively: Learn how to implement hotfixes to address potential issues and ensure stable performance during and after upgrades.
5 System Monitoring Proactively monitor the platform using system metrics using Prometheus and Grafana to ensure optimal performance and resolve issues before they affect operations.
  • Tracking key system metrics: Learn how to monitor resource usage, detect bottlenecks, and maintain platform health using Prometheus and Grafana.
  • Proactive issue resolution: Understand how to identify and address system issues before they impact operations.
6 Query cluster management Understand how to optimize and manage query clusters to provide seamless data access and performance.
  • Optimizing query clusters for better performance: Identify and resolve issues related to underperforming query clusters, including resizing and reconfiguring clusters for optimal performance.
  • Scheduling query clusters using cron jobs: Learn how to schedule query clusters using cron jobs, ensuring that they are available at specific times for batch processes or other scheduled tasks.
7 Access management Ensure appropriate access control by managing user permissions and roles within the DataOS platform.
  • Granting appropriate user access: Understand the process of evaluating and granting user access requests, ensuring that permissions are appropriately allocated according to the principle of least privilege.

Start learning: Click here to access the modules.

Quick start guides

Looking for a fast way to get up and running? Our Quick Start Guides provide step-by-step instructions for performing key tasks and operations within DataOS. Perfect for getting things done quickly!

Videos

Explore our Video Library to watch tutorials that cover various topics from the basics to advanced features of DataOS.