Reducing Power BI model size: how VertiPaq works
Power BI uses the VertiPaq engine (also called xVelocity), an in-memory columnar engine. Understanding how it compresses data is the key to building small, fast models.
Compression by column
VertiPaq uses two main algorithms:
- Run-Length Encoding (RLE) — effective for columns with repeated values (e.g. Status with "Active"/"Inactive").
- Hash Encoding — builds a dictionary of unique values; each row stores only the dictionary index.
A column with 5 unique values across 10 million rows compresses far better than a column with 1 million unique values.
Spot problems with DAX Studio
DAX Studio is free and indispensable. Connect to the model and use Advanced → View Metrics:
-- See columns by size
SELECT TOP 20
[TableName], [ColumnName],
[TotalSize], [DataSize], [DictionarySize]
FROM $SYSTEM.DISCOVER_STORAGE_TABLE_COLUMNS
ORDER BY [TotalSize] DESC
Reduction actions
- Remove columns not used in reports or DAX measures.
- Reduce decimal precision (DECIMAL(18,2) vs DECIMAL(18,6)).
- Use correct data types: INT instead of STRING for surrogate keys.
- Disable auto date hierarchy in File → Options → Data Load.
- Use incremental refresh for large fact tables instead of importing everything.
Benchmark
A well-optimized model with 50 million rows can take less than 500 MB in memory. The same un-optimized model can take 5 GB. The difference is mostly in unnecessary columns and incorrect data types.
Conclusion
Knowing VertiPaq is not academic — it is practical. Smaller models refresh faster, cost less in Premium/Fabric, and users are happier with reports that open in seconds.