How to handle missing values with pandas in Python
Real-world data rarely arrives complete: a price is missing here, a date there, an empty cell somewhere else. Handling missing values with pandas in Python is one of the most useful skills for anyone working with data, because almost every analysis misleads or breaks when there are NaN values in the mix. Below you will see, with simple examples, how to detect, remove and fill those missing values in just a few lines of code.
Prerequisites
- Python 3 installed on your computer.
- The pandas library installed (
pip install pandas). - Knowing what a
DataFrameis — a table of rows and columns in pandas. - An editor or notebook of your choice, for example VS Code or Jupyter.
Step 1: Create a sample DataFrame
So you can practise without relying on a file, start by creating a small table with a few missing values. In pandas, a missing value shows up as NaN (Not a Number). You can create it with None or with np.nan.

import pandas as pd
import numpy as np
dados = {
"nome": ["Ana", "Bruno", "Carla", "Diogo"],
"idade": [28, np.nan, 35, 41],
"cidade": ["Lisboa", "Porto", None, "Braga"],
}
df = pd.DataFrame(dados)
print(df)
Notice that Bruno's age and Carla's city are empty. This is exactly the kind of gap we will learn to fix.
Step 2: Detect the missing values
Before deciding what to do, it is important to know how many values are missing and in which columns. The isna() method returns True for every empty cell (the isnull() method does exactly the same). Add sum() and you get the count per column.
# True/False para cada celula
print(df.isna())
# Contagem de valores em falta por coluna
print(df.isna().sum())
The result shows 1 in the idade column and 1 in the cidade column. Now you know the size of the problem and can pick the right strategy.
Step 3: Remove rows with missing values
The most direct approach is to delete the incomplete rows with dropna(). It is fast, but there is a risk: if you have little data, you may throw away valuable information.
# Remove qualquer linha com pelo menos um valor em falta
df_sem_falta = df.dropna()
# Remove apenas quando a idade esta em falta
df_com_idade = df.dropna(subset=["idade"])
Use the subset parameter when only one column is truly critical. That way you keep the remaining rows instead of losing them all.
Step 4: Fill the missing values
Often it is better to fill than to delete. The fillna() method replaces NaN with a value of your choice. Note that, in pandas, you have to reassign the result to the column for the change to stick.
# Preencher a cidade em falta com um texto
df["cidade"] = df["cidade"].fillna("Desconhecida")
# Preencher a idade com a media das idades existentes
media_idade = df["idade"].mean()
df["idade"] = df["idade"].fillna(media_idade)
For time-ordered data, such as a series of daily sales, it makes more sense to repeat the last known value than to use an average. In that case, use ffill() (fill forward) or bfill() (fill backward). These are alternative strategies — pick one:
# Repete o ultimo valor valido para baixo
df["idade"] = df["idade"].ffill()
Tip: filling with the mean is simple, but the median (median()) is usually more robust when there are very extreme values.
Check the result
After cleaning the data, confirm that no missing value is left. If everything went well, the count should be 0 in every column.
print(df.isna().sum())
print(df)
If a number greater than zero still appears, check two things: whether you applied the fill to the right column and whether you reassigned the result, as in df["col"] = df["col"].fillna(...). It is the most common beginner mistake.
Conclusion
You now have the essentials for handling missing values with pandas: detect with isna(), remove with dropna(), and fill with fillna(), ffill() or bfill(). The next step is deciding the strategy case by case, because deleting is not always the best option and the mean is not always the right value. Looking at your own data, which of these approaches makes the most sense?