Disclaimer

This article is part of a series, we highly recommend reading the following articles:

These will provide the necessary context to fully grasp the concepts we discuss here.

Introduction

In the previous article, we explored the core foundational components that power the analytics world. Let’s start with a quick one-liner recap of each:

  • Databases: Storage systems optimized for storing and managing real-time transactional/operational data.
  • Data Warehouse: Centralized repository designed for structured and historical data.
  • Data Mart: Smaller, subject-specific slices of a DWH.
  • Data Lake: Scalable storage for raw, unstructured / semi-structured data.
  • Operational Data Store: A staging area for operational reporting and near real-time data access.
  • Master Data Management: A system for maintaining, and governing a single, consistent, and authoritative source of key business data.

Today, we’ll take a step back and look at the big picture: how all these components come together to form a unified data architecture. We’ll give you an overview of the main architectures currently found in the today landscape 🏞️.

✔️
If you’re interested in a deeper dive, wanting to cover the history, layers, and practical applications of each architecture, you’ll find dedicated detailed articles for every architecture.

Traditional Data Warehousing

Traditional data warehousing follows a structured, schema-on-write approach. Data is ingested through ETL pipelines into a centralized data warehouse, often complemented by downstream data marts. This architecture ensures a Single Version of Truth (SVOT) by consolidating, transforming, and standardizing data before it’s used for analysis.

Why it emerged ? This was the first architecture built specifically to decouple analytics from operational systems. Instead of querying production databases directly—which posed risks to performance and data integrity—organizations could now rely on a dedicated analytical environment designed for consolidating different sources, reporting and decision-making.

Key Characteristics

  • Optimized for structured data using relational databases (often columnar) and supporting historical data analysis with slowly changing dimensions (SCDs).
  • Business-focused architecture, with dedicated ETL jobs running to ensure data freshness.
  • Ensures data consistency and governance through well-defined schemas.
  • Typically follows a layered architecture (as relational tables in databases):
    • 1️⃣ Staging layer: where raw data structured lands after extraction, often minimally transformed.
    • 2️⃣ DWH layer (or core): where structured data is cleansed, standardized, and modelized into a normalized model (3nF-ish or Data Vault) aligned with business domains.
    • 3️⃣ Data Mart layer (or presentation): where structured data is reshaped into denormalized, domain-specific views for consumption by reporting tools or dashboards.

Challenges

  • High complexity due to rigid data models.
  • Data latency due to batch processing cycles (often daily).
  • Limited flexibility for semi-structured/unstructured data : How to do Machine Learning and Gen AI when you have just structured data ?

And in real-life ?

Imagine a global company like Yamaha, which sells both motorcycles and musical instruments. Each business unit may use its own CRM or sales system. You need to consolidate all of this into a unified view for strategic analysis. 

Each day, data is extracted from these operational systems (often including some semi-structured data from other sources), then staged, transformed, and loaded into a normalized data warehouse model. This model defines core business entities such as Product, Category, Sales, or Supplier.

From there, data marts are built for different domains, finance, operations—providing clean, tailored access to the data they need, without touching the raw sources.

Curious about the how, why, and what’s under the hood? Dive into the full technical breakdown here 👉 Architectural Deep Dive: Traditional Data Warehousing.

💡
While RDWs are not obsolete, the rise of modern data architectures has introduced alternatives that balance structure with flexibility.

Hybrid Data Warehousing

Hybrid Data Warehousing blends the best of both worlds: the structure and governance of traditional data warehouses with the flexibility and scale of data lakes. Instead of choosing between a rigid schema-on-write DWH and a freeform schema-on-read Data Lake, Hybrid Data Warehousing allows organizations to use both—in tandem and with purpose.

In this architecture, data flows through a layered pipeline that can land in a Data Lake and/or a Data Warehouse, depending on business needs. It enables organizations to serve both business analysts using SQL tools and data scientists leveraging raw, semi-structured datasets for advanced use cases like ML or Gen AI.

Why it emerged ? When Data Lakes first emerged, many predicted the death 💀 of Data Warehouses. The promise was simple: ingest all types of data cheaply and flexibly. But the reality hit quickly. The “put everything in Data Lake” mindset didn’t work. Without proper metadata, governance, or schema enforcement, many Data Lakes became unmanageable swamps 🐸, making BI nearly impossible. Meanwhile, Data Warehouses struggled to handle modern data challenges like real-time streams and semi-structured formats. That’s where Hybrid Data Warehousing stepped in 🚀, offering the best of both worlds.

Key Characteristics

  • Coexistence of two architectural paradigms: A centralized Data Warehouse (and its associated data marts) operates alongside a Data Lake.
  • Dual modeling approaches: Combines schema-on-write (for structured, governed data in the DWH) with schema-on-read (for raw, exploratory data in the DL).
  • Versatile ingestion strategies: Supports both ETL pipelines feeding the warehouse and ELT workflows landing data in the lake.
  • Diverse processing capabilities: Enables SQL-based querying for business users, while leveraging big data tools (e.g., Spark, Hive) for large-scale and semi-structured data processing.

Challenges

  • Data duplication: Imagine you have an important client table in your CRM. You use ELT to load it into your DWH for business intelligence. Meanwhile, a data scientist needs the same table for an ML model, so you export it as a Parquet file into the DL. Now, the same data exists twicein 2 formats, with different metadata, stored in distinct locations.
  • Higher cost and complexity: You’re maintaining two architectures, which means more tools, more storage, and more integration overhead.
  • Data governance becomes tricky: Synchronizing access control, lineage, and metadata between the lake and warehouse environments requires tight coordination between data engineers, analysts, and governance teams.

And in real-life ?

Take a major U.S. retailer like Target. They deal with high-volume, structured data such as point-of-sale transactions, inventory levels, financial, and supplier orders. But they also collect unstructured or semi-structured data like website clickstreams, IoT sensor logs from shelves, and customer reviews.

In a Hybrid DWH setup, Target can load structured data like sales and inventory into the Data Warehouse for dashboards and reporting, while also storing it in the Data Lake as raw backups or for advanced forecasting models. Unstructured data, such as customer reviews, is kept in the lake to support natural language processing and behavioral analytics.

Looking to explore the architectural roots, evolution, and core principles in depth? We’ve got you covered 👉 Architectural Deep Dive: Hybrid Data Warehousing.

💡
Most large companies still operate at this stage today, but the high cost of maintaining two separate platforms and the duplication of data are pushing many organizations toward more unified solutions like Data Lakehouse or Data Fabric.

Data Lakehouse

Data Lakehouse architecture aims to unify the scalability and flexibility of a Data Lake with the reliability, structure, and performance of a Data Warehouse. It enables direct analytics on data stored in open formats like Parquet or ORC, while layering transactional capabilities, schema enforcement, and metadata management on top.

Instead of juggling a rigid Data Warehouse and a flexible Data Lake in a Hybrid Data Warehousing architecture, the Lakehouse provides a single platform to serve both BI analysts and AI/ML practitioners, without duplicating data or maintaining separate pipelines.

Key Characteristics

  • Built on open file formats with added support for transactional updates, schema enforcement, and metadata management
  • Enables record-level operations like insert, update, and delete directly in the lake
  • Supports time travel for auditability and historical analysis
  • Optimized for both BI and data science use cases
  • Typically follows a Medaillon architecture :
    • 🟤 Bronze: Ingests raw data from diverse sources. Structured and semi-structured data is typically stored in Parquet or Delta, while unstructured data lands in object storage.
    • Silver: Processes structured and semi-structured data to apply cleaning and normalization (3nF-ish or Data Vault). It improves consistency and schema alignment, stored in Delta format for reliability and time travel.
    • 🟡 Gold: Contains denormalized domain-specific (or data products) datasets, designed for direct consumption by BI tools, APIs, or ML models.

Challenges

  • Relational Layer Is Not Native: BI teams used to relational databases (with stored procedures, views, etc.) face a steep transition to Spark SQL or file-based logic like Parquet or Delta.
  • Performance Gaps: While improving rapidly, Lakehouses can still lag behind MPP warehouses in advanced query planning, indexing, and fine-grained access control. They can be more slow today, but they are quickly catching up 🏃 !

And in real-life ?

Think of a fintech company handling transactions, product usage logs, and customer interactions. With a Lakehouse, they can store everything in the same environment and use SQL for dashboards, Spark for ML pipelines, and time travel for historical reviews—all from a single data source.
They stay compliant by deleting data at the record level when required, without rebuilding entire datasets. And they avoid the complexity and cost of maintaining two platforms.

Want to know how Data Lakehouse works ? Check this 👉 Architectural Deep Dive: Data Lakehouse.

✔️
Data Lakehouse is increasingly being adopted as a primary analytical storage due to their flexibility and scalability !

Data Fabric

Data Fabric isn’t a storage architecture like the previous ones, but rather a logical architecture built on top of existing data platforms. Its primary goal is to provide a unified layer for data access and governance across all sources, regardless of their format, technology, or physical location.

It introduces a unified abstraction layer over your data ecosystem: virtualization, centralized metadata management, lineage tracking, data cataloging, policy enforcement, and automated data delivery.

Data Fabric

Key Characteristics

  • Combines multiple DWH, DL, and Lakehouse into a managed ecosystem.
  • Unified data access through virtualization, APIs, and metadata catalogs.
  • Integrated governance including security policies, data lineage, and MDM.
  • Real-time processing capabilities for operational analytics.

Challenges

Let’s be clear: Data Fabric doesn’t eliminate complexity, it reorganizes it ! It’s often adopted when you can’t centralize storage due to regulatory, geographical, or organizational constraints, like in multi-tenant or multi-region setups.

It acts as a modern Band-Aid 🩹, allowing distributed systems to converge into a governed framework. But it comes with trade-offs: higher integration cost, metadata dependency, and a steep setup curve.

And in real-life ?

A multinational bank with data silos in Europe, Asia, and North America uses a Data Fabric to unify access across all regions without physically moving sensitive data.

Want to know how Data Fabric works ? Check this 👉 Architectural Deep Dive: Data Fabric.

💡
Data Fabric is gaining traction as it offers a unified layer enabling seamless access and governance across distributed environments. However, adoption is still in its early stages, and maturity levels vary.

Data Mesh

Data Mesh is a decentralized data architecture approach that shifts ownership to business domains, breaking away from centralized IT control. It is not a physical architecture nor a virtual integration layer, but an organizational model that redefines how teams work with data.

Each domain is responsible for its own data products, built with quality, documentation, and governance in mind. It promotes autonomy, cross-functional collaboration, and scalable data delivery

Data Mesh

Key Characteristics

  1. Domain Ownership: Each domain team is responsible for the ingestion, transformation, and delivery of its own data products (dashboards, APIs, ML models).
  2. Data as a Product: Data teams in each domain treat consumers (analysts, scientists) as customers, ensuring high-quality data.
  3. Self-Service Platform: A central platform team provides infrastructure components like pipeline templates, deployment automation, monitoring, and data quality tools.
  4. Federated Governance: Enterprise-wide governance policies are centrally defined but locally enforced by each domain. This balance enables flexibility while maintaining trust, compliance, and standardization.

Challenges

  • Platform fragmentation: In theory, each domain can choose its own tech stack (e.g., Databricks, Cloudera, GCP), but in practice, this leads to operational complexity, and exponential costs 💸.
  • Interoperability: Sharing data products across heterogeneous stacks introduces friction (auth, access control, lineage tracking).
  • Governance at scale: Maintaining consistency across decentralized teams requires strong policies, metadata, and automation.
  • Cultural shift: Domains must adopt a product mindset, take ownership of data quality, and manage infrastructure lifecycles.
💡
Data Mesh remains a largely theoretical architecture to this day, as decentralizing data ownership leads to governance and standardization challenges. Let’s see what the future holds for us !

Summary

“Et voilà”, no real summary to write. This is a big article with a lot of technical details, covering all the current implementations of data architecture as implemented in today’s organizations. Make sure to check the deep dives to get more out of each architecture’s characteristics, pros, and cons!