Skip to main content

Slowly Changing Dimensions – An introduction

Being new to data warehousing, I never heard about the term "Slowly Changing Dimensions" before. It sounded like academic jargon. It isn't. It's one of those concepts that, once it clicks, explains half the weird design decisions you'll see in any reporting database.

This post is the explainer I wish existed before I had to learn it the hard way: by breaking a report

Facts and dimensions, the short version

Data warehouses generally split data into two kinds of tables.

  • Fact tables hold the things that happened: an order, a sale, a support ticket, a click. They're typically just numbers and foreign keys — quantity, amount, a reference to which customer, which product, which date.
  • Dimension tables hold the descriptive context around those facts: who the customer is, what the product is called, which region a salesperson covers. A fact on its own is nearly meaningless. "47 units, customer 1182, product 309" tells you nothing until you join it against the dimension tables and find out that's 47 units of a winter jacket, sold to a customer in Antwerp, by a rep in the Benelux region.

This split is the foundation of the star schema, and it's why dimension tables exist at all: to give facts meaning without repeating descriptive data in every single fact row.

So what's "slowly changing" about it?

Here's the problem. Dimension data isn't static. A customer moves. A product gets reclassified into a different category. A sales rep gets reassigned to a different territory. These changes don't happen constantly — they're not like a stock price ticking every second — but they do happen, unpredictably, over time. That's the "slowly" in Slowly Changing Dimension: not never, just not often, and not on any schedule you can plan around.

The question this forces on you is simple to state and surprisingly consequential to answer:

When a dimension attribute changes, what happens to the old value?

Ralph Kimball catalogued these patterns decades ago in dimensional modeling, and the industry still refers to them by number rather than by name. There are more of them than the two everyone reaches for first, so it's worth knowing the full set even if you'll mostly live in two of them.

Type 0: nothing changes, by design

Not every attribute should ever change, and Type 0 is the explicit acknowledgment of that. These are values fixed at the point a record is created and meant to stay that way for the life of the record: a date of birth, an original signup date, a product's launch date. If the source system corrects one of these, that's treated as a one-off data fix, not a business event worth tracking — unlike a Type 1 change, it isn't expected to happen as part of normal operation.

A common practical example is a date dimension table. The date 2026-06-30 was, is, and always will be a Tuesday in late June. Nothing about it changes, which makes Type 0 the right (if unglamorous) default for it.

Type 1: overwrite it

The simplest option. When the value changes, you update the existing row. The old value is gone. Your dimension table always reflects the current state of the world, and nothing else.

CustomerID Name City
123 Acme Corp Antwerp

Acme moves to Brussels. After a Type 1 update:

CustomerID Name City
123 Acme Corp Brussels

That's it. No trace that Antwerp ever existed. This is fine — good, even — when nobody needs to know the history. Correcting a typo in a customer's name is a textbook Type 1 case: there's no business value in remembering the misspelled version.

The catch: every fact table row that references this customer now implicitly "moves" to Brussels too, even facts from years ago when the customer really was in Antwerp. If you're reporting on historical sales by city, last year's Antwerp sales just silently became Brussels sales. Nobody decided that should happen. It's just a side effect of overwriting.

Type 2: keep everything

Instead of overwriting, you close out the old row and insert a new one. The old row stays in the table, marked as no longer current, typically with effective dates and a current-row flag.

Surrogate Key CustomerID Name City Valid From Valid To Is Current
1001 C-123 Acme Corp Antwerp 2023-01-15 2026-02-20 No
1002 C-123 Acme Corp Brussels 2026-02-20 NULL Yes

Now your fact table can join against the version of the customer that was actually true at the time the fact happened, using effective dates. Last year's Antwerp sales stay Antwerp sales. This is the pattern you reach for when history genuinely matters: compliance, audit trails, "what did the org chart look like in Q2," trend analysis that needs to reflect the world as it actually was at each point in time.

This is also why Type 2 introduces a surrogate key — a new, meaningless, auto-generated ID per version of the row — separate from the natural key (CustomerID here, the identifier that comes from the source system and stays stable across versions). The fact table joins on the surrogate key, not the natural key, because the surrogate key is what pins a fact to a specific version of the dimension. This distinction trips up almost everyone the first time, so it's worth sitting with: one customer, one natural key, potentially many surrogate keys over its lifetime.

The cost is exactly what you'd expect: more rows, more storage, and more complexity in every query that needs "current state" rather than "state as of."

Type 3: keep just the previous value

A middle ground, and the least commonly used in practice. Instead of a new row per change, you add a column for "previous value" alongside "current value."

CustomerID Name Current City Previous City
123 Acme Corp Brussels Antwerp

This only remembers one step back. If Acme moves a second time, the Antwerp history is gone, overwritten just like Type 1 would have done. It's useful for narrow cases — "show me what changed in the last update" dashboards — but it doesn't generalize to full history, so most real-world systems that need history at all skip straight to Type 2.

Type 4: split current and history into separate tables

Type 4 takes a different structural approach than 1 through 3. Instead of cramming history into the same table as the current record, you keep two tables: a current table that always holds the latest version of each row, and a separate history table that accumulates every prior version. When a change happens, the current table gets updated in place and the version it replaced gets pushed into the history table.

The appeal is performance and simplicity for the common case. Most queries only care about current state, so they hit a small, fast, current-only table. The history table only gets touched when someone specifically needs to look back, which tends to be rarer. This separation also keeps the "what is true right now" table clean, without Valid_From/Valid_To/Is_Current columns cluttering every query that doesn't care about history.

The tradeoff is now you're maintaining two tables instead of one, and joins that need to reconstruct "what did the full picture look like on this date" have to reach into the history table explicitly rather than just filtering a single table.

Type 1 and Type 2 cover the overwhelming majority of real scenarios you'll run into day to day. But Type 0 and Type 4 aren't edge cases either — you'll likely use Type 0 in every date dimension you ever build, and Type 4 shows up regularly in high-volume operational systems where query performance on current state matters. Knowing the full set means picking deliberately instead of reaching for Type 2 by default because it's the one with the most documentation.

How to actually decide

Start by ruling out Type 0: is this attribute supposed to ever change at all? If not — birth dates, launch dates, anything that's fixed at creation — that's Type 0 and you're done with that column.

For everything else, the question that matters isn't technical, it's: will anyone ever need to know what this looked like before it changed?

If yes, you're in Type 2 or Type 4 territory (full history). If no, Type 1. A few prompts that tend to surface the right answer:

  • Could a regulator or auditor ask "what was true on this date"? → Type 2 (or Type 4 if current-state query performance is a real concern).
  • Is this purely operational lookup data nobody analyzes historically (a status code, a corrected typo)? → Type 1.
  • Does the business explicitly want to measure something "as it was then" — last quarter's sales by the regions that existed then, this employee's assignment history? → Type 2.
  • Would preserving history just bloat the table with noise nobody will query? → Type 1.
  • Is the table so large or so frequently queried for current state that even a well-indexed Is_Current filter becomes a real cost? → consider Type 4, splitting current and history into separate tables.

It's also fine — common, even — for the same dimension table to apply different types to different columns. A customer's name correction might be Type 1, their signup date Type 0, while their region assignment is Type 2, because only one of those is something the business wants to track changes for.

Why this matters before you touch any tooling

It's tempting to skip straight to "how do I implement SCD Type 2 in [tool of choice]," but the type decision is a business decision made before any implementation, not a technical default you pick because it seems more thorough. Type 2 isn't strictly better than Type 1 — it's more expensive in every dimension that matters (storage, query complexity, ETL logic) and that cost is only worth paying when the business genuinely needs the history.

Get this decision wrong in either direction and you'll find out later, usually when someone asks a question your data can't answer (picked Type 1, needed history) or when you're maintaining a bloated dimension table nobody ever queries historically (picked Type 2, didn't need it).

Once you know which type you need, the implementation question becomes much narrower. In a follow-up post, I'll walk through how Microsoft Fabric handles SCD Type 2 specifically, including the newer no-code option in Copy job that handles most of this mechanism for you.

More information

Slowly changing dimension - Wikipedia

Popular posts from this blog

Podman– Command execution failed with exit code 125

After updating WSL on one of the developer machines, Podman failed to work. When we took a look through Podman Desktop, we noticed that Podman had stopped running and returned the following error message: Error: Command execution failed with exit code 125 Here are the steps we tried to fix the issue: We started by running podman info to get some extra details on what could be wrong: >podman info OS: windows/amd64 provider: wsl version: 5.3.1 Cannot connect to Podman. Please verify your connection to the Linux system using `podman system connection list`, or try `podman machine init` and `podman machine start` to manage a new Linux VM Error: unable to connect to Podman socket: failed to connect: dial tcp 127.0.0.1:2655: connectex: No connection could be made because the target machine actively refused it. That makes sense as the podman VM was not running. Let’s check the VM: >podman machine list NAME         ...

Azure DevOps/ GitHub emoji

I’m really bad at remembering emoji’s. So here is cheat sheet with all emoji’s that can be used in tools that support the github emoji markdown markup: All credits go to rcaviers who created this list.

VS Code Planning mode

After the introduction of Plan mode in Visual Studio , it now also found its way into VS Code. Planning mode, or as I like to call it 'Hannibal mode', extends GitHub Copilot's Agent Mode capabilities to handle larger, multi-step coding tasks with a structured approach. Instead of jumping straight into code generation, Planning mode creates a detailed execution plan. If you want more details, have a look at my previous post . Putting plan mode into action VS Code takes a different approach compared to Visual Studio when using plan mode. Instead of a configuration setting that you can activate but have limited control over, planning is available as a separate chat mode/agent: I like this approach better than how Visual Studio does it as you have explicit control when plan mode is activated. Instead of immediately diving into execution, the plan agent creates a plan and asks some follow up questions: You can further edit the plan by clicking on ‘Open in Editor’: ...