(+351) 21 24 10006  ·  info@bconcepts.pt
Carnaxide, Lisbon
Modelação de Dados (Kimball)
Modelação de Dados (Kimball) 1 min

Data Vault 2.0: flexibility and traceability for modern warehouses

João Barros 13 de August de 2025 1 min read

Data Vault 2.0 is a modeling methodology for raw data vaults that prioritizes auditability, load parallelism and adaptability to business change. It does not replace Kimball — the two coexist: Data Vault in the raw/silver layer, dimensional in the gold/presentation layer.

The 3 table types

HUB — unique business identity
  Contains: hash key + business key + load metadata
  e.g. HUB_Customer (hk_customer, nk_customer_erp, load_dts, rec_src)

LINK — relationships between Hubs
  Contains: link hash key + hk of each related Hub + metadata
  e.g. LNK_Sale (hk_sale, hk_customer, hk_product, load_dts, rec_src)

SATELLITE — descriptive attributes and history
  Contains: parent hk + hash diff + attributes + metadata
  e.g. SAT_Customer_CRM (hk_customer, load_dts, hash_diff, name, email, segment)

Hub implementation

CREATE TABLE dbo.HUB_Customer (
    hk_customer BINARY(16) NOT NULL PRIMARY KEY,  -- MD5/SHA-1 of the business key
    nk_customer INT        NOT NULL,
    load_dts    DATETIME2  NOT NULL DEFAULT SYSDATETIME(),
    rec_src     VARCHAR(50)NOT NULL                -- source system
);

-- Insert only new ones (no updates — Data Vault is insert-only)
INSERT INTO dbo.HUB_Customer (hk_customer, nk_customer, rec_src)
SELECT HASHBYTES('MD5', CAST(nk_customer AS VARCHAR)), nk_customer, 'ERP_SAP'
FROM stg.Customers src
WHERE NOT EXISTS (SELECT 1 FROM dbo.HUB_Customer WHERE hk_customer = HASHBYTES('MD5', CAST(src.nk_customer AS VARCHAR)));

Satellite with change detection

-- Inserts only when attributes change (hash_diff)
INSERT INTO dbo.SAT_Customer_CRM (hk_customer, load_dts, hash_diff, name, email, segment, rec_src)
SELECT
    HASHBYTES('MD5', CAST(src.nk_customer AS VARCHAR)) AS hk_customer,
    SYSDATETIME(),
    HASHBYTES('MD5', CONCAT(src.name, '|', src.email, '|', src.segment)) AS hash_diff,
    src.name, src.email, src.segment, 'CRM'
FROM stg.Customers src
LEFT JOIN (SELECT hk_customer, hash_diff FROM dbo.SAT_Customer_CRM WHERE load_dts = (SELECT MAX(load_dts) FROM dbo.SAT_Customer_CRM s2 WHERE s2.hk_customer = dbo.SAT_Customer_CRM.hk_customer)) latest
    ON HASHBYTES('MD5', CAST(src.nk_customer AS VARCHAR)) = latest.hk_customer
WHERE latest.hash_diff IS NULL OR latest.hash_diff <> HASHBYTES('MD5', CONCAT(src.name, '|', src.email, '|', src.segment));

Conclusion

Data Vault 2.0 shines in environments with many sources, strong audit requirements and frequently changing schemas. The insert-only model with hash keys enables parallel loading of hundreds of tables and full traceability. Combine it with a dimensional layer for BI consumption.

Share: