How to Build Incremental Loads in ELT: Step by Step

João Barros 04 de July de 2026 4 min read

Incremental loads are the secret to keeping an ELT pipeline fast and cheap: instead of reloading millions of records every night, you process only the rows that changed since the last run. Full reloads waste time and compute, and they get worse as your data grows. The watermark pattern combined with a MERGE statement solves the problem simply and in a timeless way — and it works on SQL Server, Azure SQL, Azure Synapse or the Microsoft Fabric Warehouse.

Prerequisites

A source table with a last-modified column (for example, data_modificacao) or an ever-increasing ID.
A SQL target where you transform the data (SQL Server, Azure SQL, Synapse or Fabric Warehouse).
Permissions to create tables and run INSERT, MERGE and UPDATE.
Basic SQL knowledge and, later, an orchestration tool to schedule it.

Step 1: Create the control (watermark) table

The watermark is simply the highest value already processed — usually a date/time. We store it in a small control table so that each run knows exactly where the previous one stopped. Create the table once and register a low initial value, so the first load brings the full history:

CREATE TABLE dbo.controlo_carga (
    tabela_origem    VARCHAR(128) NOT NULL PRIMARY KEY,
    ultimo_watermark DATETIME2    NOT NULL
);

-- Valor inicial baixo para a 1a carga trazer todo o historico
INSERT INTO dbo.controlo_carga (tabela_origem, ultimo_watermark)
VALUES ('vendas', '1900-01-01');

Step 2: Read the current watermark

At the start of every run, read the stored value. It becomes the lower bound of your data window: we only want rows newer than it. Keep it in a variable to reuse in the next steps.

DECLARE @watermark DATETIME2;

SELECT @watermark = ultimo_watermark
FROM dbo.controlo_carga
WHERE tabela_origem = 'vendas';

Step 3: Extract and load (E + L) the new rows

In ELT, we first bring the raw data into a staging area and only then transform it — this is what sets ELT apart from ETL. Extract only the rows changed since the watermark and load them into a staging table. This drastically cuts the transferred volume and keeps the load fast, even on tables with millions of records.

-- Limpa a staging da execucao anterior
TRUNCATE TABLE stg.vendas;

-- Carrega so o que mudou desde o watermark
INSERT INTO stg.vendas (id_venda, cliente, total, data_modificacao)
SELECT id_venda, cliente, total, data_modificacao
FROM origem.vendas
WHERE data_modificacao > @watermark;

Choose a reliable watermark column: it must increase whenever a row is created or changed. A data_modificacao updated by the application or by a trigger is ideal.

Step 4: Transform with MERGE (upsert)

Now we apply the transformation inside the target, taking advantage of its compute. MERGE performs an upsert: it updates rows that already exist and inserts the new ones, all in a single statement and a single pass over the data. This is the heart of an incremental load and it prevents duplicates.

MERGE INTO dw.vendas AS destino
USING stg.vendas AS origem
    ON destino.id_venda = origem.id_venda
WHEN MATCHED THEN
    UPDATE SET destino.cliente = origem.cliente,
               destino.total   = origem.total,
               destino.data_modificacao = origem.data_modificacao
WHEN NOT MATCHED BY TARGET THEN
    INSERT (id_venda, cliente, total, data_modificacao)
    VALUES (origem.id_venda, origem.cliente, origem.total, origem.data_modificacao);

Step 5: Update the watermark

So the next run starts exactly where this one finished, store the maximum data_modificacao you just processed. Use the staging table as the source and guard against empty runs, so you never move the mark past data you did not load.

UPDATE c
SET c.ultimo_watermark = (SELECT MAX(data_modificacao) FROM stg.vendas)
FROM dbo.controlo_carga AS c
WHERE c.tabela_origem = 'vendas'
  AND EXISTS (SELECT 1 FROM stg.vendas);

Verify the result

Confirm the load worked with three simple checks:

Compare the counts: SELECT COUNT(*) FROM dw.vendas; should reflect the expected records.
Query the control table and confirm that ultimo_watermark advanced to the most recent date.
Run the pipeline a second time without changing source data: staging should end up empty and the MERGE should change no rows. That proves the load is truly incremental and idempotent.

Conclusion

With a control table, a watermark filter and a MERGE, you turned a heavy full load into a light, repeatable incremental load. From here, schedule the process (for example, with Azure Data Factory or a Microsoft Fabric pipeline), add logs so you know how many rows were processed on each run, and handle source deletes with a soft delete column. Which table in your model will be the first you migrate to incremental loads?