Disclaimer
This article is part of a series, we highly recommend reading the following articles:
- Analytical vs Transactional : Why You Can’t Mix Oil and Water
- The Foundation of a Scalable Data Architecture : Data Storage
These will provide the necessary context to fully grasp the concepts we discuss here.
Introduction
In the previous article, we explored the core foundational components that power the analytics world. Let’s start with a quick one-liner recap of each:
- Databases: Storage systems optimized for storing and managing real-time transactional/operational data.
- Data Warehouse: Centralized repository designed for structured and historical data.
- Data Mart: Smaller, subject-specific slices of a DWH.
- Data Lake: Scalable storage for raw, unstructured / semi-structured data.
- Operational Data Store: A staging area for operational reporting and near real-time data access.
- Master Data Management: A system for maintaining, and governing a single, consistent, and authoritative source of key business data.
Today, we’ll take a step back and look at the big picture: how all these components come together to form a unified data architecture. We’ll give you an overview of the main architectures currently found in the today landscape 🏞️.

Traditional Data Warehousing
Traditional data warehousing follows a structured, schema-on-write approach. Data is ingested through ETL pipelines into a centralized data warehouse, often complemented by downstream data marts. This architecture ensures a Single Version of Truth (SVOT) by consolidating, transforming, and standardizing data before it’s used for analysis.
Why it emerged ? This was the first architecture built specifically to decouple analytics from operational systems. Instead of querying production databases directly—which posed risks to performance and data integrity—organizations could now rely on a dedicated analytical environment designed for consolidating different sources, reporting and decision-making.

Key Characteristics
- Optimized for structured data using relational databases (often columnar) and supporting historical data analysis with slowly changing dimensions (SCDs).
- Business-focused architecture, with dedicated ETL jobs running to ensure data freshness.
- Ensures data consistency and governance through well-defined schemas.
- Typically follows a layered architecture (as relational tables in databases):
- 1️⃣ Staging layer: where raw data structured lands after extraction, often minimally transformed.
- 2️⃣ DWH layer (or core): where structured data is cleansed, standardized, and modelized into a normalized model (3nF-ish or Data Vault) aligned with business domains.
- 3️⃣ Data Mart layer (or presentation): where structured data is reshaped into denormalized, domain-specific views for consumption by reporting tools or dashboards.
Challenges
- High complexity due to rigid data models.
- Data latency due to batch processing cycles (often daily).
- Limited flexibility for semi-structured/unstructured data : How to do Machine Learning and Gen AI when you have just structured data ?
And in real-life ?
Imagine a global company like Yamaha, which sells both motorcycles and musical instruments. Each business unit may use its own CRM or sales system. You need to consolidate all of this into a unified view for strategic analysis.
Each day, data is extracted from these operational systems (often including some semi-structured data from other sources), then staged, transformed, and loaded into a normalized data warehouse model. This model defines core business entities such as Product, Category, Sales, or Supplier.
From there, data marts are built for different domains, finance, operations—providing clean, tailored access to the data they need, without touching the raw sources.
Curious about the how, why, and what’s under the hood? Dive into the full technical breakdown here 👉 Architectural Deep Dive: Traditional Data Warehousing.
Hybrid Data Warehousing
Hybrid Data Warehousing blends the best of both worlds: the structure and governance of traditional data warehouses with the flexibility and scale of data lakes. Instead of choosing between a rigid schema-on-write DWH and a freeform schema-on-read Data Lake, Hybrid Data Warehousing allows organizations to use both—in tandem and with purpose.
In this architecture, data flows through a layered pipeline that can land in a Data Lake and/or a Data Warehouse, depending on business needs. It enables organizations to serve both business analysts using SQL tools and data scientists leveraging raw, semi-structured datasets for advanced use cases like ML or Gen AI.
Why it emerged ? When Data Lakes first emerged, many predicted the death 💀 of Data Warehouses. The promise was simple: ingest all types of data cheaply and flexibly. But the reality hit quickly. The “put everything in Data Lake” mindset didn’t work. Without proper metadata, governance, or schema enforcement, many Data Lakes became unmanageable swamps 🐸, making BI nearly impossible. Meanwhile, Data Warehouses struggled to handle modern data challenges like real-time streams and semi-structured formats. That’s where Hybrid Data Warehousing stepped in 🚀, offering the best of both worlds.

Key Characteristics
- Coexistence of two architectural paradigms: A centralized Data Warehouse (and its associated data marts) operates alongside a Data Lake.
- Dual modeling approaches: Combines schema-on-write (for structured, governed data in the DWH) with schema-on-read (for raw, exploratory data in the DL).
- Versatile ingestion strategies: Supports both ETL pipelines feeding the warehouse and ELT workflows landing data in the lake.
- Diverse processing capabilities: Enables SQL-based querying for business users, while leveraging big data tools (e.g., Spark, Hive) for large-scale and semi-structured data processing.
Challenges
- Data duplication: Imagine you have an important client table in your CRM. You use ELT to load it into your DWH for business intelligence. Meanwhile, a data scientist needs the same table for an ML model, so you export it as a Parquet file into the DL. Now, the same data exists twice—in 2 formats, with different metadata, stored in distinct locations.
- Higher cost and complexity: You’re maintaining two architectures, which means more tools, more storage, and more integration overhead.
- Data governance becomes tricky: Synchronizing access control, lineage, and metadata between the lake and warehouse environments requires tight coordination between data engineers, analysts, and governance teams.
And in real-life ?
Take a major U.S. retailer like Target. They deal with high-volume, structured data such as point-of-sale transactions, inventory levels, financial, and supplier orders. But they also collect unstructured or semi-structured data like website clickstreams, IoT sensor logs from shelves, and customer reviews.
In a Hybrid DWH setup, Target can load structured data like sales and inventory into the Data Warehouse for dashboards and reporting, while also storing it in the Data Lake as raw backups or for advanced forecasting models. Unstructured data, such as customer reviews, is kept in the lake to support natural language processing and behavioral analytics.
Looking to explore the architectural roots, evolution, and core principles in depth? We’ve got you covered 👉 Architectural Deep Dive: Hybrid Data Warehousing.
Data Lakehouse
Data Lakehouse architecture aims to unify the scalability and flexibility of a Data Lake with the reliability, structure, and performance of a Data Warehouse. It enables direct analytics on data stored in open formats like Parquet or ORC, while layering transactional capabilities, schema enforcement, and metadata management on top.
Instead of juggling a rigid Data Warehouse and a flexible Data Lake in a Hybrid Data Warehousing architecture, the Lakehouse provides a single platform to serve both BI analysts and AI/ML practitioners, without duplicating data or maintaining separate pipelines.

Key Characteristics
- Built on open file formats with added support for transactional updates, schema enforcement, and metadata management
- Enables record-level operations like insert, update, and delete directly in the lake
- Supports time travel for auditability and historical analysis
- Optimized for both BI and data science use cases
- Typically follows a Medaillon architecture :
- 🟤 Bronze: Ingests raw data from diverse sources. Structured and semi-structured data is typically stored in Parquet or Delta, while unstructured data lands in object storage.
- ⚪ Silver: Processes structured and semi-structured data to apply cleaning and normalization (3nF-ish or Data Vault). It improves consistency and schema alignment, stored in Delta format for reliability and time travel.
- 🟡 Gold: Contains denormalized domain-specific (or data products) datasets, designed for direct consumption by BI tools, APIs, or ML models.
Challenges
- Relational Layer Is Not Native: BI teams used to relational databases (with stored procedures, views, etc.) face a steep transition to Spark SQL or file-based logic like Parquet or Delta.
- Performance Gaps: While improving rapidly, Lakehouses can still lag behind MPP warehouses in advanced query planning, indexing, and fine-grained access control. They can be more slow today, but they are quickly catching up 🏃 !
And in real-life ?
Think of a fintech company handling transactions, product usage logs, and customer interactions. With a Lakehouse, they can store everything in the same environment and use SQL for dashboards, Spark for ML pipelines, and time travel for historical reviews—all from a single data source.
They stay compliant by deleting data at the record level when required, without rebuilding entire datasets. And they avoid the complexity and cost of maintaining two platforms.
Want to know how Data Lakehouse works ? Check this 👉 Architectural Deep Dive: Data Lakehouse.
Data Fabric
Data Fabric isn’t a storage architecture like the previous ones, but rather a logical architecture built on top of existing data platforms. Its primary goal is to provide a unified layer for data access and governance across all sources, regardless of their format, technology, or physical location.
It introduces a unified abstraction layer over your data ecosystem: virtualization, centralized metadata management, lineage tracking, data cataloging, policy enforcement, and automated data delivery.

Key Characteristics
- Combines multiple DWH, DL, and Lakehouse into a managed ecosystem.
- Unified data access through virtualization, APIs, and metadata catalogs.
- Integrated governance including security policies, data lineage, and MDM.
- Real-time processing capabilities for operational analytics.
Challenges
Let’s be clear: Data Fabric doesn’t eliminate complexity, it reorganizes it ! It’s often adopted when you can’t centralize storage due to regulatory, geographical, or organizational constraints, like in multi-tenant or multi-region setups.
It acts as a modern Band-Aid 🩹, allowing distributed systems to converge into a governed framework. But it comes with trade-offs: higher integration cost, metadata dependency, and a steep setup curve.
And in real-life ?
A multinational bank with data silos in Europe, Asia, and North America uses a Data Fabric to unify access across all regions without physically moving sensitive data.
Want to know how Data Fabric works ? Check this 👉 Architectural Deep Dive: Data Fabric.
Data Mesh
Data Mesh is a decentralized data architecture approach that shifts ownership to business domains, breaking away from centralized IT control. It is not a physical architecture nor a virtual integration layer, but an organizational model that redefines how teams work with data.
Each domain is responsible for its own data products, built with quality, documentation, and governance in mind. It promotes autonomy, cross-functional collaboration, and scalable data delivery

Key Characteristics
- Domain Ownership: Each domain team is responsible for the ingestion, transformation, and delivery of its own data products (dashboards, APIs, ML models).
- Data as a Product: Data teams in each domain treat consumers (analysts, scientists) as customers, ensuring high-quality data.
- Self-Service Platform: A central platform team provides infrastructure components like pipeline templates, deployment automation, monitoring, and data quality tools.
- Federated Governance: Enterprise-wide governance policies are centrally defined but locally enforced by each domain. This balance enables flexibility while maintaining trust, compliance, and standardization.
Challenges
- Platform fragmentation: In theory, each domain can choose its own tech stack (e.g., Databricks, Cloudera, GCP), but in practice, this leads to operational complexity, and exponential costs 💸.
- Interoperability: Sharing data products across heterogeneous stacks introduces friction (auth, access control, lineage tracking).
- Governance at scale: Maintaining consistency across decentralized teams requires strong policies, metadata, and automation.
- Cultural shift: Domains must adopt a product mindset, take ownership of data quality, and manage infrastructure lifecycles.
Summary
“Et voilà”, no real summary to write. This is a big article with a lot of technical details, covering all the current implementations of data architecture as implemented in today’s organizations. Make sure to check the deep dives to get more out of each architecture’s characteristics, pros, and cons!