Medallion architecture in Microsoft Fabric: Bronze, Silver and Gold
The Medallion architecture (or Lakehouse Architecture) organizes data into three layers of increasing quality: Bronze (raw), Silver (clean) and Gold (ready for consumption). It is the reference pattern for modern data engineering and fits naturally into Microsoft Fabric.
Bronze layer — Raw
Data ingested as it arrives from the source, without transformations. The goal is fidelity and traceability.
# Notebook: Bronze ingestion
df_raw = spark.read.format("csv").option("header","true").load("Files/raw/sales_*.csv")
df_raw = df_raw.withColumn("_ingest_ts", current_timestamp()) \
.withColumn("_source_file", input_file_name())
df_raw.write.format("delta").mode("append").save("Tables/bronze_sales")
Silver layer — Clean and Enriched
Transformed data: deduplication, type conversion, enrichment with reference dimensions.
# Notebook: Silver transformation
from pyspark.sql.functions import col, to_date, upper, trim
df_silver = spark.read.format("delta").load("Tables/bronze_sales") \
.dropDuplicates(["sale_id"]) \
.withColumn("sale_date", to_date(col("date_str"), "dd/MM/yyyy")) \
.withColumn("country", upper(trim(col("country")))) \
.filter(col("amount") > 0)
df_silver.write.format("delta").mode("overwrite") \
.option("overwriteSchema","true").save("Tables/silver_sales")
Gold layer — Aggregated for Consumption
Data modeled for consumption by BI, data science or APIs. Usually in a Warehouse or as highly optimized Delta tables.
-- SQL in the Gold Warehouse
CREATE TABLE gold.fact_sales_monthly AS
SELECT
DATE_TRUNC('month', sale_date) AS month,
country,
SUM(amount) AS total_revenue,
COUNT(*) AS num_transactions
FROM silver_sales
GROUP BY 1, 2
Conclusion
The Medallion architecture makes data more reliable, pipelines simpler to maintain and collaboration between teams more efficient. In Fabric, each layer is naturally a separate Lakehouse, sharing the same OneLake with shortcuts.