What is ETL and how to design your first data pipeline
ETL stands for Extract, Transform, Load — the process that moves data from several sources into an analytical destination. Understanding the three steps is the basis for building any data pipeline.
Prerequisites
- Knowledge of SQL and where your data lives (files, databases, APIs).
- A destination for the data (a data warehouse or analytical database).
- An ETL tool or a language such as Python.
Step 1: Extract
Read the data from the source as it is, without changing it. It can be a file, a table or an API:

SELECT id, cliente, valor, data
FROM origem.vendas
WHERE data >= '2023-01-01';
Step 2: Transform
Clean and adapt the data to the destination rules: fix types, remove duplicates, normalise text, compute columns. This is where most of the work lives.
Step 3: Load
Write the transformed data into the destination, usually incrementally (only what is new):
INSERT INTO dw.fact_vendas (cliente_key, valor, data_key)
SELECT cliente_key, valor, data_key
FROM staging.vendas_limpo;
Step 4: Orchestrate and schedule
A pipeline runs repeatedly. Schedule it (daily, hourly) and log each run so you know whether it succeeded.
Verify the result
Compare the row count in the source and the destination and check a few values by hand. The totals should match.
Conclusion
Extract, Transform, Load is a simple but powerful pattern. Once you master the basic flow, the next step is to make it incremental and monitored. Which data source will you integrate first?