(+351) 21 24 10006  ·  info@bconcepts.pt
Carnaxide, Lisbon
Python para dados
Python para dados 2 min

How to read and clean CSV data with pandas in Python

João Barros 16 de May de 2023 2 min read

Working with data almost always starts with a CSV file. The pandas library in Python is the fastest way to read, inspect and clean it before any analysis.

Prerequisites

  • Python 3.9 or later installed.
  • The pandas library: pip install pandas.
  • A sample CSV file (for example vendas.csv).

Step 1: Read the CSV

Import pandas and load the file into a DataFrame:

How to read and clean CSV data with pandas in Python
import pandas as pd

df = pd.read_csv("vendas.csv")
print(df.head())

The head() method shows the first five rows so you can confirm the data was read correctly.

Step 2: Inspect the data

Before cleaning, understand what you have:

print(df.info())
print(df.isnull().sum())

info() shows the column types and isnull().sum() counts missing values per column.

Step 3: Clean missing values and duplicates

df = df.drop_duplicates()
df["preco"] = df["preco"].fillna(0)
df = df.dropna(subset=["cliente"])

We drop repeated rows, fill missing prices with 0 and discard rows without a customer.

Step 4: Fix the data types

df["data"] = pd.to_datetime(df["data"], format="%d/%m/%Y")
df["preco"] = df["preco"].astype(float)

Verify the result

Run df.info() and df.isnull().sum() again. Essential columns should have no missing values and dates should appear as datetime.

Conclusion

With a handful of pandas lines you turn a raw CSV into a reliable dataset, ready for analysis. What other transformation do you usually need on your files before analysing them?

Share: