Pandas where() – Select Rows based on Values in a Dataframe Column

Pandas where(): In this new tutorial we will see how to use the where() function on a column of a dataframe of the pandas module.
Introduction
A pandas dataframe is a two-dimensional tabular data structure that can be modified in size with labeled axes that are commonly referred to as row and column labels, with different arithmetic operations aligned with the row and column labels.
The Pandas library, available on python, allows to import data and to make quick analysis on loaded data.
The where() function allows you to replace the values for which your condition is False. The main purpose of this function is to replace values that do not satisfy one or more criteria. By default, The rows not satisfying the condition are filled with NaN value.
This is a function we don’t use often, but in some cases, it can be very useful instead of using the .apply() function.
In this tutorial we will see the following topics:
- Pandas Where with DataFrame single condition
- Where() with DataFrame multiple condition
- Pandas Where with Series condition
- Pandas Where with Callable condition
To illustrate these different points, we will use the following dataframe:
import pandas as pd
pokedex = [(10, 40, 40, 20),
(20, 50, 10, 30),
(20, 50, 70, 50),
(30, 20, 30, 70),
(40, 10, 50, 70)
]
df = pd.DataFrame(pokedex,
columns=['Power1', 'Power2', 'Power3', 'Power4'],
index=['Bulbasaur', 'Charmander', 'Squirtle', 'Blastoise','Caterpie'])
print(df)
Output :
Power1 Power2 Power3 Power4
Bulbasaur 10 40 40 20
Charmander 20 50 10 30
Squirtle 20 50 70 50
Blastoise 30 20 30 70
Caterpie 40 10 50 70
Pandas where()
Pandas where() Syntax
The syntax of the where() function is as follows:
DataFrame.where(cond,
other= NaN,
inplace= False,
axis= None,
level= None,
errors= 'raise',
try_cast= False)
This function can take several parameters. Here is a detailed description of each parameter:
Name | Description | Type | Default Value | Required |
---|---|---|---|---|
cond | The cond parameter is the place where the condition to be checked will be satisfied. | boolean, Series or DataFrame , an array-like structure or a callable | – | Yes |
other | All values that do not satisfy the previously given condition will be treated accordingly by this other parameter. | scalar, Series /DataFrame , or a callable | NaN | No |
inplace | Whether to perform the operation in place on the data. | Boolean | False | No |
axis | Alignment axis if needed. | Integer | None | No |
level | Alignment level if needed. | Integer | None | No |
errors | String | raise | No | |
try_cast | Try to cast the result back to the input type (if possible) | Boolean | False | No |
Pandas where() with Single Condition
To start off, we will filter in our dataframe all values that are greater than 60. To know all the values greater than 60 in our dataframe we can use the following condition:
print(df<60)
Power1 Power2 Power3 Power4
Bulbasaur True True True True
Charmander True True True True
Squirtle True True False True
Blastoise True True True False
Caterpie True True True False
In this example, we see that 3 values are equal to False. To replace all the values that are greater than 60 by the value ‘High’, we can use the following row:
df = df.where(df<60, "High")
print(df)
Power1 Power2 Power3 Power4
Bulbasaur 10 40 40 20
Charmander 20 50 10 30
Squirtle 20 50 High 50
Blastoise 30 20 30 High
Caterpie 40 10 50 High
Pandas where() with Multiple Condition
It is also possible to work with several conditions.
# Multiple Condition
filter1 = df['Power1']<40
filter2 = df['Power2']<50
df = df.where(filter1 & filter2, "OK")
print(df)
Output :
Power1 Power2 Power3 Power4
Bulbasaur 10 40 40 20
Charmander OK OK OK OK
Squirtle OK OK OK OK
Blastoise 30 20 30 70
Caterpie OK OK OK OK
Pandas where() with Series condition
If you want to fill an entire row based on a Pandas Series, it is possible to pass the Series in the condition.
filter1 = df['Power3']<50
df = df.where(filter1, "Strong")
print(df)
Output :
Power1 Power2 Power3 Power4
Bulbasaur 10 40 40 20
Charmander 20 50 10 30
Squirtle Strong Strong Strong Strong
Blastoise 30 20 30 70
Caterpie Strong Strong Strong Strong
Pandas where() with Callable condition
You can also specify a callable condition for your where cond parameter. This means that you can pass a function to evaluate your value. To test this, we will use a lambda function (this also works with a normal function):
# Callable condition
df = df.where(lambda x: x<50, "High")
print(df)
Output:
Power1 Power2 Power3 Power4
Bulbasaur 10 40 40 20
Charmander 20 High 10 30
Squirtle 20 High High High
Blastoise 30 20 30 High
Caterpie 40 10 High High
Conclusion
In this tutorial, we learned how to use the where() function present in the Pandas package. This package is very useful when you want to clean or process data in a dataframe.
If you have any questions about the use of this function, don’t hesitate to tell me in the comments, I will be happy to answer them.
Comments
Leave a comment