How to read and clean CSV data with pandas in Python
Working with data almost always starts with a CSV file. The pandas library in Python is the fastest way to read, inspect and clean it before any analysis.
Prerequisites
- Python 3.9 or later installed.
- The pandas library:
pip install pandas. - A sample CSV file (for example
vendas.csv).
Step 1: Read the CSV
Import pandas and load the file into a DataFrame:

import pandas as pd
df = pd.read_csv("vendas.csv")
print(df.head())
The head() method shows the first five rows so you can confirm the data was read correctly.
Step 2: Inspect the data
Before cleaning, understand what you have:
print(df.info())
print(df.isnull().sum())
info() shows the column types and isnull().sum() counts missing values per column.
Step 3: Clean missing values and duplicates
df = df.drop_duplicates()
df["preco"] = df["preco"].fillna(0)
df = df.dropna(subset=["cliente"])
We drop repeated rows, fill missing prices with 0 and discard rows without a customer.
Step 4: Fix the data types
df["data"] = pd.to_datetime(df["data"], format="%d/%m/%Y")
df["preco"] = df["preco"].astype(float)
Verify the result
Run df.info() and df.isnull().sum() again. Essential columns should have no missing values and dates should appear as datetime.
Conclusion
With a handful of pandas lines you turn a raw CSV into a reliable dataset, ready for analysis. What other transformation do you usually need on your files before analysing them?