Pandas where() – Select Rows based on Values in a Dataframe Column
Pandas where(): In this new tutorial we will see how to use the where() function on a column of a dataframe of the pandas module.
A pandas dataframe is a two-dimensional tabular data structure that can be modified in size with labeled axes that are commonly referred to as row and column labels, with different arithmetic operations aligned with the row and column labels.
The Pandas library, available on python, allows to import data and to make quick analysis on loaded data.
The where() function allows you to replace the values for which your condition is False. The main purpose of this function is to replace values that do not satisfy one or more criteria. By default, The rows not satisfying the condition are filled with NaN value.
This is a function we don’t use often, but in some cases, it can be very useful instead of using the .apply() function.
In this tutorial we will see the following topics:
- Pandas Where with DataFrame single condition
- Where() with DataFrame multiple condition
- Pandas Where with Series condition
- Pandas Where with Callable condition
To illustrate these different points, we will use the following dataframe:
import pandas as pd pokedex = [(10, 40, 40, 20), (20, 50, 10, 30), (20, 50, 70, 50), (30, 20, 30, 70), (40, 10, 50, 70) ] df = pd.DataFrame(pokedex, columns=['Power1', 'Power2', 'Power3', 'Power4'], index=['Bulbasaur', 'Charmander', 'Squirtle', 'Blastoise','Caterpie']) print(df)
Power1 Power2 Power3 Power4 Bulbasaur 10 40 40 20 Charmander 20 50 10 30 Squirtle 20 50 70 50 Blastoise 30 20 30 70 Caterpie 40 10 50 70
Pandas where() Syntax
The syntax of the where() function is as follows:
DataFrame.where(cond, other= NaN, inplace= False, axis= None, level= None, errors= 'raise', try_cast= False)
This function can take several parameters. Here is a detailed description of each parameter:
|The cond parameter is the place where the condition to be checked will be satisfied.||boolean, ||–||Yes|
|All values that do not satisfy the previously given condition will be treated accordingly by this other parameter.|| scalar, ||NaN||No|
|Whether to perform the operation in place on the data.||Boolean||False||No|
|axis||Alignment axis if needed.||Integer||None||No|
|level||Alignment level if needed.||Integer||None||No|
|try_cast||Try to cast the result back to the input type (if possible)||Boolean||False||No|
Pandas where() with Single Condition
To start off, we will filter in our dataframe all values that are greater than 60. To know all the values greater than 60 in our dataframe we can use the following condition:
print(df<60) Power1 Power2 Power3 Power4 Bulbasaur True True True True Charmander True True True True Squirtle True True False True Blastoise True True True False Caterpie True True True False
In this example, we see that 3 values are equal to False. To replace all the values that are greater than 60 by the value ‘High’, we can use the following row:
df = df.where(df<60, "High") print(df) Power1 Power2 Power3 Power4 Bulbasaur 10 40 40 20 Charmander 20 50 10 30 Squirtle 20 50 High 50 Blastoise 30 20 30 High Caterpie 40 10 50 High
Pandas where() with Multiple Condition
It is also possible to work with several conditions.
# Multiple Condition filter1 = df['Power1']<40 filter2 = df['Power2']<50 df = df.where(filter1 & filter2, "OK") print(df)
Power1 Power2 Power3 Power4 Bulbasaur 10 40 40 20 Charmander OK OK OK OK Squirtle OK OK OK OK Blastoise 30 20 30 70 Caterpie OK OK OK OK
Pandas where() with Series condition
If you want to fill an entire row based on a Pandas Series, it is possible to pass the Series in the condition.
filter1 = df['Power3']<50 df = df.where(filter1, "Strong") print(df)
Power1 Power2 Power3 Power4 Bulbasaur 10 40 40 20 Charmander 20 50 10 30 Squirtle Strong Strong Strong Strong Blastoise 30 20 30 70 Caterpie Strong Strong Strong Strong
Pandas where() with Callable condition
You can also specify a callable condition for your where cond parameter. This means that you can pass a function to evaluate your value. To test this, we will use a lambda function (this also works with a normal function):
# Callable condition df = df.where(lambda x: x<50, "High") print(df)
Power1 Power2 Power3 Power4 Bulbasaur 10 40 40 20 Charmander 20 High 10 30 Squirtle 20 High High High Blastoise 30 20 30 High Caterpie 40 10 High High
In this tutorial, we learned how to use the where() function present in the Pandas package. This package is very useful when you want to clean or process data in a dataframe.
If you have any questions about the use of this function, don’t hesitate to tell me in the comments, I will be happy to answer them.
Leave a comment