Slowly Changing Dimensions Type 2: preserving history in the data warehouse
João Barros
20 de March de 2025
2 min read
Slowly Changing Dimensions (SCD) solve a fundamental problem in data warehousing: what to do when a dimension attribute changes? The answer depends on the business need to preserve historical state.
The 3 main types
TYPE 1 — Overwrite (no history):
Before: Customer X — Segment: SMB
After the change: Customer X — Segment: Enterprise
→ The past always looks Enterprise. Good for fixing errors.
TYPE 2 — New record (full history):
Record 1: Customer X, Segment SMB, valid_from: 2020-01-01, valid_to: 2023-12-31, is_current: 0
Record 2: Customer X, Segment Enterprise, valid_from: 2024-01-01, valid_to: 9999-12-31, is_current: 1
→ Historical analysis shows SMB on sales until 2023. The most used.
TYPE 3 — Additional column (only latest + previous):
Customer X — Current_Segment: Enterprise, Previous_Segment: SMB
→ Preserves only 1 change. Rare but useful for reorganizations.
Implement SCD Type 2 with MERGE
-- ETL process: apply source changes to the dimension
MERGE dbo.DimCustomer AS tgt
USING (SELECT nk_customer, name, segment, country FROM stg.Customers) AS src
ON tgt.nk_customer = src.nk_customer AND tgt.is_current = 1
-- Type 1: update attributes that keep no history (name - error fix)
WHEN MATCHED AND tgt.name <> src.name
THEN UPDATE SET tgt.name = src.name
-- Type 2: close the current record and insert a new one
WHEN MATCHED AND tgt.segment <> src.segment
THEN UPDATE SET tgt.valid_to = CAST(GETDATE() AS DATE), tgt.is_current = 0;
-- Insert new records: new customers + new version of SCD2
INSERT INTO dbo.DimCustomer (nk_customer, name, segment, country, valid_from, valid_to, is_current)
SELECT src.nk_customer, src.name, src.segment, src.country,
CAST(GETDATE() AS DATE), '9999-12-31', 1
FROM stg.Customers src
LEFT JOIN dbo.DimCustomer tgt ON src.nk_customer = tgt.nk_customer AND tgt.is_current = 1
WHERE tgt.sk_customer IS NULL
OR (tgt.sk_customer IS NOT NULL AND tgt.segment <> src.segment);
Correct join in reports
-- For correct historical analysis, use the surrogate key from the transaction date
SELECT
s.revenue,
c.segment, -- the customer's segment AT THE TIME of the sale
d.year
FROM dbo.FactSales s
JOIN dbo.DimCustomer c ON s.sk_customer = c.sk_customer -- SK preserves historical context
JOIN dbo.DimDate d ON s.sk_date = d.sk_date
Conclusion
SCD Type 2 is the backbone of correct historical analysis. Without it, historical reports show customers with their current segment, not what they had when they bought. The surrogate key is the mechanism that preserves context — never join on the natural key in historical facts.