(+351) 21 24 10006  ·  info@bconcepts.pt
Carnaxide, Lisbon
Azure Synapse Analytics
Azure Synapse Analytics 1 min

Synapse Link for Cosmos DB: operational analytics without ETL

João Barros 15 de July de 2025 1 min read

Azure Synapse Link for Cosmos DB is an HTAP (Hybrid Transactional/Analytical Processing) integration that automatically replicates Cosmos DB data into an analytical store in Synapse, with no impact on transactional operations.

How it works

Cosmos DB (OLTP)
  └─ Transactional Store (row-oriented, short TTL)
  └─ Analytical Store (column-oriented, auto-sync, long TTL)
            ↑
            Synapse Link (automatic replication, ~2 min latency)
            ↓
  Azure Synapse Analytics
    └─ Serverless SQL Pool → T-SQL over Cosmos data
    └─ Spark Pool → PySpark over Cosmos data

Enable Synapse Link in Cosmos DB

// Portal: Cosmos DB account → Features → Azure Synapse Link → Enable
// Takes ~5 minutes; not reversible

// Enable per container (via SDK or portal):
az cosmosdb sql container update \
  --account-name cosmos-bconcepts \
  --database-name operational \
  --name orders \
  --analytical-storage-ttl -1  // -1 = no expiration

Query with Serverless SQL

-- In Synapse Studio (no pipeline, data updated in ~2 min)
SELECT
    JSON_VALUE(doc, '$.customer_id') AS customer_id,
    JSON_VALUE(doc, '$.status')      AS status,
    JSON_VALUE(doc, '$.total_value') AS value,
    JSON_VALUE(doc, '$.created_at')  AS date
FROM OPENROWSET(
    PROVIDER = 'CosmosDB',
    CONNECTION = 'account=cosmos-bconcepts;database=operational',
    OBJECT = 'orders',
    SERVER_CREDENTIAL = 'CosmosDBCredential'
)
WITH (doc NVARCHAR(MAX)) AS rows
WHERE JSON_VALUE(doc, '$.status') = 'Pending';

Query with PySpark

df = spark.read.format("cosmos.olap") \
    .option("spark.synapse.linkedService", "CosmosDB_Operational") \
    .option("spark.cosmos.container", "orders") \
    .load()

df.groupBy("status").agg({"total_value": "sum"}).show()

Conclusion

Synapse Link eliminates the classic pattern of replicating operational data into the data warehouse via nightly ETL. For analysis over near-real-time Cosmos DB data, it is the simplest and most efficient solution available on Azure.

Share: