Disclaimer
This article is part of a series, we highly recommend reading the following articles:
- The Foundation of a Scalable Data Architecture : Data Modeling 101
- Data Modeling: Dive into data normalization levels
These will provide the necessary context to fully grasp the concepts we discuss here.
Introduction
“Cause you’re a sky, full of stars, I’m gonna give my heart.” 🎶 While this may not be a standard way to begin a technical article, today we’ll be exploring the galaxies and the stars of data modeling 😁. Today, we will analyze how different modeling techniques impact analytical workloads !
From Normalization to Analytics: A Shift in Paradigm
In our previous article, we discussed normalization levels, ranging from 1NF to 5NF, and highlighted why Boyce-Codd Normal Form (BCNF) is crucial for transactional workloads (specifically for relational databases). BCNF ensures data integrity and reduces redundancy, making it an essential design principle for OLTP systems.
What Changes in Analytical Workloads?
Analytical systems, unlike transactional ones, follow the Insert-Once, Read-Many paradigm, meaning they prioritize read performance over write efficiency. The key challenge in analytics is optimizing query performance, often at the price of increased storage usage and redundancy.
In OLAP environments, we often use columnar-oriented tables for storage and transmission (because they’re more efficient, see this article !), and data models are designed to enhance aggregation, reporting, and query performance. This is where denormalized models shine, as they reduce the need for complex joins, making data retrieval much faster.
Common OLAP Modeling Techniques

Let’s explore three key modeling approaches used in analytical workloads, ranked from the most denormalized to the least normalized.
Wide Table Schema

A wide table schema is the simplest form of denormalization. It stores all relevant data in a single table, significantly reducing the need for joins.
- Data is flattened into one large table.
- Simplifies queries since everything is stored together.
- Increases storage consumption due to redundancy.
In real-world scenarios, wide table models aren’t very common. They usually appear in highly specialized cases that demand ultra-low latency or machine learning applications. This happens when the computational overhead of joining multiple tables becomes too costly.
Dimensional Modeling
Dimensional modeling structures data into fact tables and dimension tables, making it the standard approach for business intelligence (BI) and data warehousing. This methodology was pioneered by Ralph Kimball, a renowned data warehousing expert, whose approach emphasizes optimizing query performance and usability in BI systems.
What Are Fact and Dimension Tables?
Dimension Tables: Provide descriptive context to facts, such as time, product details, customer demographics, or locations. They enable filtering, grouping, and drilling into the data for analysis.
👉 Don't forget: In dimension tables, using a surrogate key (such as a UUID) is crucial for maintaining consistency and preventing key conflicts. Natural keys, which come directly from source systems, may not be unique across different data sources.
Fact Tables: Contain measurable business data, such as sales transactions, revenue, or quantities. They are central to analytical queries and include foreign keys linking to dimension tables.

Star Schema
- A single fact table is surrounded by multiple dimension tables.
- Simplifies queries as joins are straightforward.
- Optimized for high-speed aggregation.
Snowflake Schema
- Extends the star schema by normalizing dimension tables.
- Reduces data redundancy but introduces more joins.
- Suitable for more complex analytical use cases.

Galaxy Schema
- Also known as fact constellation.
- Supports multiple fact tables sharing common dimensions.
- Used for complex multi-fact analysis in large enterprises.
Today, dimensional modeling remains the standard approach for around 90% of companies (or maybe less—just a guesstimate) when designing their Gold layer or Data Marts.
Always transform and calculate data before it reaches the analytical layer to ensure dashboards and analytics run smoothly.
Golden rule of BI ✨
One of its biggest benefits is that the system can optimize for speed by joining data internally. However, to maintain those performance benefits, some pitfalls should be avoided:
- Many-to-many cardinalities → These often lead to convoluted queries and slow performance.
- Direct fact-to-fact relationships → Fact tables should connect through shared dimensions instead of linking directly.
Data Vault Modeling

Data Vault is a hybrid approach that balances normalization and denormalization. It is designed for historical tracking, auditability, and scalability.
Key Components:
- Hubs → Represent core business entities (e.g., Customer, Product).
- Links → Capture relationships between entities.
- Satellites → Store descriptive attributes and historical changes.
Data Vault capitalize on an insert-only pattern, utilizing hashing and load dates for tracking.
This architecture is well-suited for:
- Historical traceability: Every data point can be tracked over time, supporting compliance and governance needs.
- Highly scalable for large and evolving datasets and well-suited for integrating data from multiple sources.
- Event-driven architectures: As event streaming becomes more prevalent (and batch processing declines), the Data Vault model’s flexibility makes it an attractive choice for real-time or near-real-time data ingestion.
How Are These Used in Real Life?
In the past, the go-to data modeling approach was heavily normalized, following the methodology introduced by Bill Inmon—often called the father of Data Warehousing. This approach aimed to create a normalized model from the source to the analytics layer, primarily because compute was expensive at the time (argument not relevant today).
On the other hand, Ralph Kimball advocated for the opposite philosophy: no centralized data warehouse, just denormalized Data Marts optimized for analytical workloads.
Fast forward 20 years, and the industry has evolved by blending both approaches into a practical compromise. A common approach to organizing data in modern environments is the three-layer structure.
🥉 Bronze Layer (Raw Data)
- Stores unprocessed and raw data from different BU/SUs, countries, etc.
- Useful for Machine Learning & analytics teams who need full-fidelity data.
🥈 Silver Layer (Normalized – 3NF)
- Business-Domain schema (e.g. products, sales, ressources)
- Data is cleaned, deduplicated, historized and normalized.
- Ensures consistency and prepares data for further transformations.
- We typically use raw 3nF or Data Vault to model this layer.
🥇 Gold Layer (BI-Ready – Dimensional or Data Vault)
- Data Product dimensional schemas, pre-aggregates and transforms data to support fast and efficient analytic queries.
- Optimized for analytics & reporting, using either Dimensional Models or Wide-Table.
Conclusion
So, what do you think? Convinced yet? We’ve covered a lot in this article : the vast constellation of data models 🪐 in the analytics world and how raw, unusable data is transformed into a polished dashboard presented to the company’s executive team.
If you want to learn more about how to keep historical data without driving your system crazy or storing billions and billions of lines (with no real use), check out our next article in the data modeling series: Data Modeling: Storing Historical Data.