Kimball dimensional modeling: a practical guide to facts and dimensions
João Barros
25 de October de 2024
2 min read
Ralph Kimball's dimensional methodology has been the reference standard for data warehousing for over 30 years. Its principles — facts, dimensions and Bus Architecture — apply equally well in traditional SQL Server, Azure Synapse and Microsoft Fabric.
Facts vs Dimensions
FACT TABLE:
- Contains numeric measures (revenue, quantity, cost)
- Many rows, few columns
- Foreign keys to all dimensions
- Defined granularity: 1 row = 1 sales transaction
DIMENSION TABLE:
- Contains descriptive attributes (name, category, region)
- Few rows, many columns
- Surrogate key (INT) as PK — never the natural key
- Denormalized (Kimball dislikes snow-flake)
Define granularity — the most important decision
Question: "What does one row of the fact table represent?"
Wrong granularity: "sales data"
Correct granularity: "one row per sales line item on each invoice"
Impact: if granularity is "per invoice" and the user wants
analysis by product, it is impossible. Always pick the most
atomic granularity available in the source.
Typical structure — star schema
-- Customer dimension (denormalized)
CREATE TABLE dbo.DimCustomer (
sk_customer INT IDENTITY PRIMARY KEY, -- surrogate key
nk_customer INT NOT NULL, -- natural key from source
name NVARCHAR(200),
segment VARCHAR(50),
country VARCHAR(100),
region VARCHAR(100), -- denormalized (do not normalize into DimCountry)
valid_from DATE NOT NULL,
valid_to DATE NOT NULL,
is_current BIT NOT NULL DEFAULT 1
);
-- Fact table
CREATE TABLE dbo.FactSales (
sk_date INT NOT NULL REFERENCES dbo.DimDate(sk_date),
sk_customer INT NOT NULL REFERENCES dbo.DimCustomer(sk_customer),
sk_product INT NOT NULL REFERENCES dbo.DimProduct(sk_product),
sk_store INT NOT NULL REFERENCES dbo.DimStore(sk_store),
revenue DECIMAL(18,2) NOT NULL,
cost DECIMAL(18,2) NOT NULL,
quantity INT NOT NULL
);
Date table — mandatory
-- Generate a date table with all required attributes
CREATE TABLE dbo.DimDate (
sk_date INT PRIMARY KEY, -- YYYYMMDD format as INT
full_date DATE NOT NULL,
year SMALLINT,
quarter TINYINT,
month TINYINT,
month_name VARCHAR(20),
week TINYINT,
day_of_week TINYINT,
day_name VARCHAR(20),
is_weekend BIT,
is_holiday BIT
);
Conclusion
Kimball works because it aligns the model with the way business users think about data. Each dimension is an analysis perspective, each fact is a business event. This alignment makes reports more intuitive and queries simpler to write and optimize.