Pandas where() – Select Rows based on Values in a Dataframe Column

pandas where

Pandas where(): In this new tutorial we will see how to use the where() function on a column of a dataframe of the pandas module.

Introduction

A pandas dataframe is a two-dimensional tabular data structure that can be modified in size with labeled axes that are commonly referred to as row and column labels, with different arithmetic operations aligned with the row and column labels.

The Pandas library, available on python, allows to import data and to make quick analysis on loaded data.

The where() function allows you to replace the values for which your condition is False. The main purpose of this function is to replace values that do not satisfy one or more criteria. By default, The rows not satisfying the condition are filled with NaN value.

This is a function we don’t use often, but in some cases, it can be very useful instead of using the .apply() function.

In this tutorial we will see the following topics:

  • Pandas Where with DataFrame single condition
  • Where() with DataFrame multiple condition
  • Pandas Where with Series condition
  • Pandas Where with Callable condition

To illustrate these different points, we will use the following dataframe:

import pandas as pd

pokedex = [(10, 40, 40, 20),
           (20, 50, 10, 30),
           (20, 50, 70, 50),
           (30, 20, 30, 70),
           (40, 10, 50, 70)
           ]

df = pd.DataFrame(pokedex, 
columns=['Power1', 'Power2', 'Power3', 'Power4'], 
index=['Bulbasaur', 'Charmander', 'Squirtle', 'Blastoise','Caterpie'])

print(df)

Output :

                Power1  Power2  Power3  Power4
Bulbasaur       10      40      40      20
Charmander      20      50      10      30
Squirtle        20      50      70      50
Blastoise       30      20      30      70
Caterpie        40      10      50      70

Pandas where()

Pandas where() Syntax

The syntax of the where() function is as follows:

DataFrame.where(cond,
                other= NaN,
                inplace= False,
                axis= None,
                level= None,
                errors= 'raise',
                try_cast= False) 

This function can take several parameters. Here is a detailed description of each parameter:

NameDescriptionTypeDefault ValueRequired
condThe cond parameter is the place where the condition to be checked will be satisfied.boolean, Series or DataFrame, an array-like structure or a callableYes
otherAll values that do not satisfy the previously given condition will be treated accordingly by this other parameter. scalar, Series/DataFrame, or a callableNaNNo
inplaceWhether to perform the operation in place on the data.BooleanFalseNo
axisAlignment axis if needed.Integer NoneNo
levelAlignment level if needed.Integer NoneNo
errorsStringraiseNo
try_castTry to cast the result back to the input type (if possible)BooleanFalseNo

Pandas where() with Single Condition

To start off, we will filter in our dataframe all values that are greater than 60. To know all the values greater than 60 in our dataframe we can use the following condition:

print(df<60)

              Power1  Power2  Power3  Power4
Bulbasaur     True    True    True    True
Charmander    True    True    True    True
Squirtle      True    True   False    True
Blastoise     True    True    True   False
Caterpie      True    True    True   False

In this example, we see that 3 values are equal to False. To replace all the values that are greater than 60 by the value ‘High’, we can use the following row:

df =  df.where(df<60, "High")
print(df)

            Power1  Power2 Power3 Power4
Bulbasaur       10      40     40     20
Charmander      20      50     10     30
Squirtle        20      50   High     50
Blastoise       30      20     30   High
Caterpie        40      10     50   High

Pandas where() with Multiple Condition

It is also possible to work with several conditions.

# Multiple Condition 

filter1 = df['Power1']<40
filter2 = df['Power2']<50

df = df.where(filter1 & filter2, "OK")
print(df)

Output :

           Power1 Power2 Power3 Power4
Bulbasaur      10     40     40     20
Charmander     OK     OK     OK     OK
Squirtle       OK     OK     OK     OK
Blastoise      30     20     30     70
Caterpie       OK     OK     OK     OK

Pandas where() with Series condition

If you want to fill an entire row based on a Pandas Series, it is possible to pass the Series in the condition.

filter1 = df['Power3']<50

df = df.where(filter1, "Strong")
print(df)

Output :

            Power1  Power2  Power3  Power4
Bulbasaur       10      40      40      20
Charmander      20      50      10      30
Squirtle    Strong  Strong  Strong  Strong
Blastoise       30      20      30      70
Caterpie    Strong  Strong  Strong  Strong

Pandas where() with Callable condition

You can also specify a callable condition for your where cond parameter. This means that you can pass a function to evaluate your value. To test this, we will use a lambda function (this also works with a normal function):

# Callable condition

df = df.where(lambda x: x<50, "High")
print(df)

Output:

            Power1 Power2 Power3 Power4
Bulbasaur       10     40     40     20
Charmander      20   High     10     30
Squirtle        20   High   High   High
Blastoise       30     20     30   High
Caterpie        40     10   High   High

Conclusion

In this tutorial, we learned how to use the where() function present in the Pandas package. This package is very useful when you want to clean or process data in a dataframe.

If you have any questions about the use of this function, don’t hesitate to tell me in the comments, I will be happy to answer them.

Back to Python Menu

Published
Categorized as Python

By ayed_amira

I'm a data scientist. Passionate about new technologies and programming I created this website mainly for people who want to learn more about data science and programming :)

Leave a comment

Your email address will not be published.