Source-Aligned Data Product¶
Overview
Raw data is often where every data journey begins—but in its untouched state, it’s rarely fit for immediate use. While it holds valuable insights, raw data often comes with inconsistencies, missing values, and formatting issues that make it hard to trust, interpret, or act on.
📘 Scenario¶
Our Retail Source-Aligned Data Product unlocks the true value of customer, product, and sales data by cleaning, transforming, and governing it—then making it accessible through an intuitive, self-service experience. The result? Trusted, high-quality data that’s ready for exploration, analysis, and downstream consumption.
💡 What Is a Source-aligned Data Product?¶
A Source-Aligned Data Product (also known as an Entity-First Data Product) is a curated, high-quality version of data that closely mirrors the structure and behavior of its source system—but with a critical difference: it's cleaned, validated, and made fit for purpose.
These products preserve the original data’s granularity and lineage while enforcing strict quality checks such as completeness, accuracy, and consistency. They're built to serve as trustworthy foundations for analytics, dashboards, and more advanced Data Products.
Key features¶
Source-aligned (Entity-First) Data Product:
-
Source Representation: Mirrors the structure and content of operational systems for familiarity and traceability.
-
Entity-Centric Design: Built around key business entities like customers, products, and transactions.
-
High Data Quality: Undergoes thorough profiling and validation for accuracy, completeness, and consistency.
-
Governance-Ready: Aligned with organizational policies and access control standards.
-
Lineage & Traceability: Retains full visibility into data origins and transformations.
-
Ready for Consumption: Provides a trusted input for BI dashboards, analytics, and downstream pipelines.
-
Scalable Foundation: Reduces duplication and inconsistency across Data Products and teams.
Self-check quiz¶
1. What benefit does a source-aligned Data Product provide to downstream consumers like analysts or dashboard users?
A. Fast ingestion speed
B. Access to raw, unfiltered data
C. Trusted, high-quality data ready for exploration and analysis
D. Encryption for compliance
2. Which of the following is an expected output of a source-aligned Data Product?
A. Executable code
B. Raw logs
C. Clean, validated datasets ready for downstream analytics
D. Deployment scripts
Next step¶
Now that you understand the concept and value of source-aligned Data Products, it’s time to get hands-on.
In the next module, you’ll learn how to connect to raw data sources and set up the building blocks needed to bring your Data Product to life.