Unity Catalog in Databricks: unified governance for data and AI
João Barros
01 de January de 2025
2 min read
Unity Catalog is the Databricks data governance solution that unifies access control, lineage and auditing in a single metadata plane shared across all workspaces. It replaces local Hive metastores with a centralized, multi-workspace catalog.
Object hierarchy
Metastore (1 per region)
└─ Catalog (e.g. prod, dev, raw)
└─ Schema / Database
└─ Table / View / Volume / Function / Model
Create a basic structure
-- SQL in Databricks
CREATE CATALOG IF NOT EXISTS prod;
CREATE SCHEMA IF NOT EXISTS prod.sales;
CREATE TABLE prod.sales.fact_orders
USING DELTA AS SELECT * FROM hive_metastore.legacy.orders;
Granular access control
-- Grant read access to a group
GRANT SELECT ON TABLE prod.sales.fact_orders TO `analysts`;
-- Access to a full schema
GRANT USE SCHEMA, SELECT ON SCHEMA prod.sales TO `data_team`;
-- Mask a sensitive column
ALTER TABLE prod.sales.customers
ALTER COLUMN tax_id SET MASK mask_pii USING COLUMNS (current_user());
Automatic lineage
Unity Catalog automatically captures lineage between tables when you use SQL or Delta Live Tables. View it in the Data Explorer: Table → Lineage Graph.
External Locations and Volumes
-- Register external storage
CREATE EXTERNAL LOCATION my_adls
URL 'abfss://container@account.dfs.core.windows.net/'
WITH (STORAGE CREDENTIAL my_credential);
-- Volume for access to non-tabular files
CREATE VOLUME prod.raw.incoming_files
LOCATION 'abfss://container@account.dfs.core.windows.net/incoming/';
Conclusion
Unity Catalog turns Databricks into an enterprise-ready platform. With a single catalog for the whole organization, it eliminates permission silos between workspaces and gives data teams full visibility into who accesses what and where data comes from.