Masterclass Data Modeling: How Data Changes Drive Schema Design

Disclaimer

This article is part of a series, we highly recommend reading the following articles:

This is a difficult article 😵‍💫, so understanding all the previous concept is really important !

Introduction

One of the most overlooked yet important aspects of data modeling is understanding how data changes over time 🔄. Whether you’re working in a transactional system or designing an analytics platform, being explicit about data mutability helps drive better storage design, pipeline logic, and analytical accuracy !

At its core, data can be classified into two broad types based on its change behavior, either immutable or mutable. Each type has implications not only for how the data is stored, but also for how it is queried, audited, and modeled.

Types of data mutability

Immutable Data

Immutable data is data that, once written, is never modified ☠️. Any change to an immutable record results in a new record being added, while the old record remains untouched. Examples include append-only event logs, financial transactions, or blockchain records. In these systems, records represent facts at a point in time and are never updated or deleted after the fact.

For example, an event streaming system may maintain an immutable record of all user actions or sensor data, with each new event appended alongside a timestamp to preserve the full historical context. A similar approach is used in financial systems, where instead of modifying past records, changes are logged as new transactions, ensuring a transparent and traceable audit trail.

💡

The implications of immutability are significant. Because data isn’t overwritten, immutable datasets offer strong data integrity and traceability, but this can come at the cost of bloated tables 😫 that do not perform well !

Mutable Data

Mutable data represents information that reflects the current state of an entity and is subject to change over time. It is stored in a way that allows values to be updated in place, meaning previous values are typically overwritten. The system always reflects the latest known value.

Examples include customer profiles, product information, or orders (with potentially changing statuses), all of which can evolve through user interactions with a system. Without applying specific modeling strategies, updates to such mutable data often overwrite previous states, leading to the loss of historical context 🗑️ and limiting the ability to perform retrospective analysis.

💪

Mutable data is efficient for reflecting the latest state but risks losing history unless explicitly modeled.

In transactional systems (OLTP)

First of all OLTP systems prioritize high-speed transactional performance. Dump 🚮 all records in one table can result in inefficient indexing and reduced system performance. A best practice is to separate the current state of the data from historical or archived records.

Immutable data

1️⃣ Table characteristic

Append-only tables are the norm. Each row is stamped with an immutable unique primary key + a creation timestamp (and often a natural business key). No UPDATE/DELETE is possible, only INSERT. Usually, we limit indexing to the primary key to keep the write-ahead log simple and efficient.

2️⃣ Historization strategies

The table is the history ! Every row already holds a timestamped snapshot that never changes, so you don’t need extra columns to track updates. When new events occur, you just add another row, and the complete history remains intact.

3️⃣ Optimization and Storage

However, allowing these tables to grow indefinitely, with millions of rows and no optimization strategy, this quickly becomes unsustainable. That’s why we often turn to archiving or partitioning strategies to keep systems efficient and maintainable :

Shadow Tables : A business rule might define that only the last 10 years of data are kept in the main tables. Older records are moved to a secondary table (like transactions_archived), keeping the primary table lean for better read/write performance and smaller indexes.
Partitioning : Instead of storing all records in one large table, data can be split into partitions, such as by year or month (transactions_2025, transactions_2024). This approach reduces the volume of data scanned during queries, improving indexing and performance on large datasets.

4️⃣ Deletion

Deletions are generally forbidden 🚫 ! Regulatory or business requirements can demand that every fact remain queryable forever (whatever it live or archived). Where legal obligations apply (like GDPR), deletion is usually accomplished by anonymising or hashing personal fields while preserving the row itself, thereby maintaining the audit trail.

💪

1 out of 4, you have survived the OLTP immutable part, bravo !

For mutable Data

1️⃣ Table characteristic

Mutable data is typically stored in modifiable tables, where rows and attributes can be updated or changed over time. Those tables should be built for speed: one row per entity, always reflecting the latest state. Use a single stable primary key (auto-incremental for example), keep only the columns you truly need, and index the primary key and the fields you query most often. The goal is instant look-ups and lightning-fast updates, let another layer worry about history !

2️⃣ Historization strategies

Mutable data needs explicit historization, or the past is lost. That’s why the decision to treat something as mutable vs immutable has enormous design consequences. Let’s see what the most commonly used techniques are to track changes in mutable data :

Audit tables are one of the most known ways to preserve the history of changes. A separate table tracks each UPDATE, DELETE, or INSERT, enabling reconstruction of past states. These audit mechanisms can be applied at three levels:

Attribute-level: Track changes for a specific column (order status, etc.).
Table-level: Log changes made to any field within a specific table.
Database-level (rare): A centralized audit table logs changes across all tables.

In addition to audit logs, several other techniques can be used to maintain historical data in OLTP systems:

Event Sourcing: Instead of storing the current state, every change is recorded as a discrete event (e.g., CUSTOMER_REGISTERED, EMAIL_UPDATED, ORDER_CANCELLED). The current state is then derived by replaying this sequence of events. This method treats otherwise mutable data as immutable and enables full traceability of how a particular state evolved.
Versioning: Creating a new version of a row with each update. A column such as is_current with version_id indicates the active record, while older versions remain in the same table. This strategy simplifies historical tracking without the need to replay events and strikes a practical balance between implementation simplicity and historical completeness.
Temporal Tables: Supported by databases like PostgreSQL or Oracle. They allow automatic tracking of row-level changes over time without requiring manual intervention. By defining a system time period, typically using valid_from and valid_to columns, the database maintains a complete history of modifications. Enabling SYSTEM_VERSIONING ensures that each change is captured and stored seamlessly.

3️⃣ Optimization and Storage

When soft-deleted or outdated rows (versioning) pile up, the table gets bloated and slows down. Regularly move these inactive versions out to keep the main table light using same shadow tables 👻 as for immutable data.

4️⃣ Deletion

For deletions, soft deletes are preferred to preserve foreign key integrity. Force deletes are typically avoided. For GDPR compliance, Personally Identifiable Information (PII) fields are often anonymized instead of deleting records.

💪

2 out of 4, you have survived the OLTP mutable part, congrats ! BUT WAIT ! We have focused solely on relational databases in this OLTP section, click here to see others strategies about NoSQL databases.

Historization in NoSQL systems

Historization in NoSQL systems

Document Stores: Capable of both mutable and immutable historization. Mutable data can keep earlier versions inside the same document or in a companion change-tracking collection. For an immutable approach, each update is stored as a new document linked by a shared document or version ID.

Key-Value Stores: Redis and similar cache-oriented engines typically overwrite values and offer no durable history. In durable key-value systems like DynamoDB, you can emulate historization by writing each version under a timestamped key, but this is rarely used for long-term retention.

Wide-Column Stores: Naturally suited to append-only, immutable designs: every update is written as a new row keyed by a timestamp, enabling efficient historical queries.

Graph Databases: Changes are recorded through temporal nodes or edges that include validity intervals (valid_from, valid_to) or a version metadata. This lets you reconstruct the graph at any point in time while preserving both mutable updates and immutable history.

In analytical systems (OLAP)

In OLAP systems, the goal is not to reflect the current state of the world in real-time, but to analyze how things have changed over time, and to support complex aggregations, trend analysis, and data exploration at scale 📈.

These systems are designed to handle large volumes of data, often coming from many different sources : operational databases, logs, semi-structured data, APIs and transformed into formats optimized for querying. Because analytical needs prioritize completeness, consistency, and historical accuracy, the way data mutability is handled in OLAP is different from OLTP.