Skip to content

Depot as Lakehouse

Whenever a user ingests the data, it gets stored in the DataOS Lakehouse. To know more about the Lakehouse, please refer to this link.

For example:

A company in the e-commerce sector looking to analyze user behavior data across multiple platforms. The company collects large volumes of raw, unstructured user interaction data (clicks, browsing history, and purchase patterns) in a data lake. However, the marketing team requires structured, real-time insights for personalized advertising and targeting. To meet this need, the company implements the DataOS Lakehouse. They store the raw data in a cloud-based object storage system like AWS S3, but instead of working directly with the unstructured data, the team uses Apache Iceberg as a unified data format that allows both structured and unstructured data to be queried efficiently. The data is structured through a metastore, which ensures consistency and fast access for analysis. The team can now run complex queries and data processing pipelines for machine learning models to predict future customer behavior, while still being able to handle large-scale data storage and data processing in a scalable, efficient manner. This setup allows the business to have real-time analytics and long-term insights from the same data store.

The below manifest file defines the alphaomega Lakehouse, which operates on the Iceberg format and is hosted on Azure. The storage layer is managed through an S3-based lakehouse Depot, utilizing the dataos-lakehouse bucket with a relative path of /test. Access control is configured via secrets for read (alphaomega-r) and read-write (alphaomega-rw) permissions. The Lakehouse employs an Iceberg REST Catalog as its metastore for data management and utilizes Themis as the query engine.

# Resource-meta section 
name: alphaomega
version: v1alpha
type: lakehouse
tags:
    - Iceberg
    - Azure
description: lakehouse depot of storage-type S3
owner: iamgroot
layer: user

# Lakehouse-specific section 
lakehouse:
    type: iceberg
    compute: runnable-default
    iceberg:

    # Storage section 
    storage:
        depotName: alphaomega
        type: s3
        s3:
        bucket: dataos-lakehouse   
        relativePath: /test
        secrets:
        - name: alphaomega-r
            keys:
            - alphaomega-r
            allkeys: true 
        - name: alphaomega-rw
            keys:
            - alphaomega-rw
            allkeys: true  

    # Metastore section 
    metastore:
        type: "iceberg-rest-catalog"

    # Query engine section 
    queryEngine:
        type: themis
Was this page helpful?