Synapse Link for Cosmos DB: operational analytics without ETL
João Barros
15 de July de 2025
1 min read
Azure Synapse Link for Cosmos DB is an HTAP (Hybrid Transactional/Analytical Processing) integration that automatically replicates Cosmos DB data into an analytical store in Synapse, with no impact on transactional operations.
How it works
Cosmos DB (OLTP)
└─ Transactional Store (row-oriented, short TTL)
└─ Analytical Store (column-oriented, auto-sync, long TTL)
↑
Synapse Link (automatic replication, ~2 min latency)
↓
Azure Synapse Analytics
└─ Serverless SQL Pool → T-SQL over Cosmos data
└─ Spark Pool → PySpark over Cosmos data
Enable Synapse Link in Cosmos DB
// Portal: Cosmos DB account → Features → Azure Synapse Link → Enable
// Takes ~5 minutes; not reversible
// Enable per container (via SDK or portal):
az cosmosdb sql container update \
--account-name cosmos-bconcepts \
--database-name operational \
--name orders \
--analytical-storage-ttl -1 // -1 = no expiration
Query with Serverless SQL
-- In Synapse Studio (no pipeline, data updated in ~2 min)
SELECT
JSON_VALUE(doc, '$.customer_id') AS customer_id,
JSON_VALUE(doc, '$.status') AS status,
JSON_VALUE(doc, '$.total_value') AS value,
JSON_VALUE(doc, '$.created_at') AS date
FROM OPENROWSET(
PROVIDER = 'CosmosDB',
CONNECTION = 'account=cosmos-bconcepts;database=operational',
OBJECT = 'orders',
SERVER_CREDENTIAL = 'CosmosDBCredential'
)
WITH (doc NVARCHAR(MAX)) AS rows
WHERE JSON_VALUE(doc, '$.status') = 'Pending';
Query with PySpark
df = spark.read.format("cosmos.olap") \
.option("spark.synapse.linkedService", "CosmosDB_Operational") \
.option("spark.cosmos.container", "orders") \
.load()
df.groupBy("status").agg({"total_value": "sum"}).show()
Conclusion
Synapse Link eliminates the classic pattern of replicating operational data into the data warehouse via nightly ETL. For analysis over near-real-time Cosmos DB data, it is the simplest and most efficient solution available on Azure.