Data Vault 2.0: flexibility and traceability for modern warehouses
João Barros
13 de August de 2025
1 min read
Data Vault 2.0 is a modeling methodology for raw data vaults that prioritizes auditability, load parallelism and adaptability to business change. It does not replace Kimball — the two coexist: Data Vault in the raw/silver layer, dimensional in the gold/presentation layer.
The 3 table types
HUB — unique business identity
Contains: hash key + business key + load metadata
e.g. HUB_Customer (hk_customer, nk_customer_erp, load_dts, rec_src)
LINK — relationships between Hubs
Contains: link hash key + hk of each related Hub + metadata
e.g. LNK_Sale (hk_sale, hk_customer, hk_product, load_dts, rec_src)
SATELLITE — descriptive attributes and history
Contains: parent hk + hash diff + attributes + metadata
e.g. SAT_Customer_CRM (hk_customer, load_dts, hash_diff, name, email, segment)
Hub implementation
CREATE TABLE dbo.HUB_Customer (
hk_customer BINARY(16) NOT NULL PRIMARY KEY, -- MD5/SHA-1 of the business key
nk_customer INT NOT NULL,
load_dts DATETIME2 NOT NULL DEFAULT SYSDATETIME(),
rec_src VARCHAR(50)NOT NULL -- source system
);
-- Insert only new ones (no updates — Data Vault is insert-only)
INSERT INTO dbo.HUB_Customer (hk_customer, nk_customer, rec_src)
SELECT HASHBYTES('MD5', CAST(nk_customer AS VARCHAR)), nk_customer, 'ERP_SAP'
FROM stg.Customers src
WHERE NOT EXISTS (SELECT 1 FROM dbo.HUB_Customer WHERE hk_customer = HASHBYTES('MD5', CAST(src.nk_customer AS VARCHAR)));
Satellite with change detection
-- Inserts only when attributes change (hash_diff)
INSERT INTO dbo.SAT_Customer_CRM (hk_customer, load_dts, hash_diff, name, email, segment, rec_src)
SELECT
HASHBYTES('MD5', CAST(src.nk_customer AS VARCHAR)) AS hk_customer,
SYSDATETIME(),
HASHBYTES('MD5', CONCAT(src.name, '|', src.email, '|', src.segment)) AS hash_diff,
src.name, src.email, src.segment, 'CRM'
FROM stg.Customers src
LEFT JOIN (SELECT hk_customer, hash_diff FROM dbo.SAT_Customer_CRM WHERE load_dts = (SELECT MAX(load_dts) FROM dbo.SAT_Customer_CRM s2 WHERE s2.hk_customer = dbo.SAT_Customer_CRM.hk_customer)) latest
ON HASHBYTES('MD5', CAST(src.nk_customer AS VARCHAR)) = latest.hk_customer
WHERE latest.hash_diff IS NULL OR latest.hash_diff <> HASHBYTES('MD5', CONCAT(src.name, '|', src.email, '|', src.segment));
Conclusion
Data Vault 2.0 shines in environments with many sources, strong audit requirements and frequently changing schemas. The insert-only model with hash keys enables parallel loading of hundreds of tables and full traceability. Combine it with a dimensional layer for BI consumption.