(+351) 21 24 10006  ·  info@bconcepts.pt
Carnaxide, Lisbon
Data Warehouse
Data Warehouse 4 min

How to Implement SCD Type 2 in a Data Warehouse

João Barros 04 de July de 2026 4 min read

When a customer moves to a new city or a product changes category, a well-designed Data Warehouse does not overwrite the old value: it keeps the full history, so that reports about the past stay correct. The most common technique to achieve this is called SCD Type 2 (Slowly Changing Dimension), and it is simpler than it looks. Below you will build a customer dimension that records every version of every customer over time, using only SQL.

Prerequisites

  • Access to a relational database or Data Warehouse (SQL Server, Azure SQL, PostgreSQL, Snowflake…).
  • Basic SQL knowledge: CREATE TABLE, INSERT and UPDATE.
  • A source (staging) table with the current snapshot of customers, here called staging_cliente.

Step 1: Understand what changes in an SCD Type 2

A dimension stores the descriptive attributes of an entity — for example a customer's name and city. Imagine customer number 1 moves from Porto to Lisbon. In an SCD Type 1 you replace 'Porto' with 'Lisbon' and lose the previous information; any report about the past becomes wrong. In an SCD Type 2 you create a new row for the new city and keep the old row intact, marking the period during which each version was valid.

How to Implement SCD Type 2 in a Data Warehouse

To do this you need three control columns: a start date (valido_de), an end date (valido_ate) and a flag for the current version (atual).

Step 2: Create the dimension table

Besides the customer attributes, the table has a surrogate key that identifies each version, and the business key cliente_id, which repeats across the several versions of the same customer. The surrogate key matters because it is what the fact tables will point to: this way each sale is linked to exactly the version of the customer that existed on the sale date.

CREATE TABLE dim_cliente (
  cliente_sk  INT IDENTITY(1,1) PRIMARY KEY,
  cliente_id  INT          NOT NULL,
  nome        VARCHAR(100) NOT NULL,
  cidade      VARCHAR(100) NOT NULL,
  valido_de   DATE         NOT NULL,
  valido_ate  DATE         NULL,
  atual       BIT          NOT NULL
);

The validity interval is half-open: it goes from valido_de (inclusive) to valido_ate (exclusive). The current version has valido_ate set to NULL and atual = 1.

Step 3: Run the initial load

On the first run, every customer comes in as the current version. Run this step only once, to bootstrap the dimension from staging.

INSERT INTO dim_cliente (cliente_id, nome, cidade, valido_de, valido_ate, atual)
SELECT cliente_id, nome, cidade, CAST(GETDATE() AS DATE), NULL, 1
FROM staging_cliente;

Step 4: Close the versions that changed

From here on, steps 4 and 5 form the recurring load, the one you run on every update. First we close the current version of the customers whose name or city changed: we fill in the end date and set atual to 0.

UPDATE d
SET d.valido_ate = CAST(GETDATE() AS DATE),
    d.atual = 0
FROM dim_cliente d
JOIN staging_cliente s ON s.cliente_id = d.cliente_id
WHERE d.atual = 1
  AND (s.nome <> d.nome OR s.cidade <> d.cidade);

If any attribute allows NULL values, wrap the comparison with ISNULL (or COALESCE) so that changes to and from NULL are detected too.

Step 5: Insert the new version

Now we insert a new row for two cases: the customers we just closed in the previous step and brand-new customers. Both have one thing in common — right now they have no row with atual = 1. The LEFT JOIN finds exactly those.

INSERT INTO dim_cliente (cliente_id, nome, cidade, valido_de, valido_ate, atual)
SELECT s.cliente_id, s.nome, s.cidade, CAST(GETDATE() AS DATE), NULL, 1
FROM staging_cliente s
LEFT JOIN dim_cliente d
  ON d.cliente_id = s.cliente_id AND d.atual = 1
WHERE d.cliente_id IS NULL;

Check the result

Query the history of a customer that changed city. You should see two rows: the old one with atual = 0 and a filled-in end date, and the new one with atual = 1 and a NULL end date.

SELECT cliente_id, nome, cidade, valido_de, valido_ate, atual
FROM dim_cliente
WHERE cliente_id = 1
ORDER BY valido_de;

Also run a sanity check: each customer can only have one current version. The following query should return no rows.

SELECT cliente_id, COUNT(*) AS versoes_atuais
FROM dim_cliente
WHERE atual = 1
GROUP BY cliente_id
HAVING COUNT(*) > 1;

Conclusion

With these five steps you have a dimension that never loses history and that answers questions like 'which city was this customer in when they made the purchase'. The next step is to automate steps 4 and 5 in a scheduled procedure and, when the dimension has many attributes, to compare a hash of the columns instead of field by field, to keep the code shorter. Which of the dimensions in your model would benefit most from keeping history?

Share: