Pandas Sum() – Sum each Column and Row in Pandas DataFrame

pandas sum
pd.DataFrame.sum()

Pandas sum(): We will see in this tutorial how to use the sum() function for a column or row in a Pandas dataframe.

Introduction

A pandas dataframe is a two-dimensional tabular data structure that can be modified in size with labeled axes that are commonly referred to as row and column labels, with different arithmetic operations aligned with the row and column labels.

The Pandas library, available on python, allows to import data and to make quick analysis on loaded data.

In this tutorial, we will see how to use the sum() function present in the pandas library. This pandas function allows to return the sum of the values according to the axis requested in parameter. We will see the following points:

  • Use the sum() function to sum the values on the index axis (the rows)
  • Use the sum() function to sum the values on the columns axis
  • Sum the values with a multi-level index
  • Sum the values on a Series type

To illustrate these different points, we will use the following pandas dataframe:

import pandas as pd

data = (['January', 'Monday', 10000, 30000],
        ['January', 'Friday', 5000, 20000],
        ['February', 'Monday', 1000000, 2000000],
        ['February', 'Friday', 2000000, 5000000],
        ['February', 'Sunday', 5000000, 10000000],
        ['March', 'Tuesday', 4000000, 8000000])

df = pd.DataFrame(data, columns=['Month', 'Day_Week', 'Income_A', 'Income_B'])
df = df.set_index(['Month', 'Day_Week'])
                   Income_A  Income_B
Month    Day_Week                    
January  Monday       10000     30000
         Friday        5000     20000
February Monday     1000000   2000000
         Friday     2000000   5000000
         Sunday     5000000  10000000
March    Tuesday    4000000   8000000

This dataframe contains the different incomes generated per month and per day.

Pandas Dataframe sum() function

Pandas sum() Syntax

The sum() function is used to sum the values on a given axis. Its syntax is the following:

# Sum() function

DataFrame.sum(axis = None, skipna = None, level = None, numeric_only = None, min_count = 0, ** kwargs)

The function can take 6 parameters:

NameDescriptionTypeDefault ValueRequired
axisThe axis to apply the function ( 0=index,1=columns){index (0), columns (1)}Yes
skipnaExclude NA / NULL values{index (0), columns (1)}TrueNo
levelIf the axis is a MultiIndex (hierarchical), count along a particular level, reducing to a series.int or level nameNoneNo
numeric_onlyInclude only float, int, boolean columns. If none, will try to use everything, then use only numeric data. Not implemented for the series.BooleanTrueNo
min_count The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.int0No
** kwargsAdditional arguments to be passed to the function.No

Sum each Column in Pandas DataFrame

In order to sum each column of the DataFrame, you can use the axis parameter in this way:

# Sum each column 

df.sum(axis=0)

You can apply this code to our previously created dataframe:

import pandas as pd

data = (['amiradata', 'Monday', 10000, 30000],
        ['amiradata', 'Friday', 5000, 20000],
        ['google', 'Monday', 1000000, 2000000],
        ['google', 'Friday', 2000000, 5000000],
        ['google', 'Sunday', 5000000, 10000000],
        ['linkedin', 'Tuesday', 4000000, 8000000])
df = pd.DataFrame(data, columns=['Website', 'Day', 'Nb_Users', 'Nb_Pageviews'])
df = df.set_index(['Website', 'Day'])

print(df.sum(axis=0))
Result : 

Income_A    12015000
Income_B    25050000
dtype: int64

We obtain the sum of the income A and the sum of the income B on the last quarter.

Sum each Row in Pandas DataFrame

In order to sum each row of the DataFrame, you can use the axis=1 as follows:

# Sum each row

df.sum(axis=1)

You can apply this code to our previously created dataframe:

import pandas as pd

data = (['January', 'Monday', 10000, 30000],
        ['January', 'Friday', 5000, 20000],
        ['February', 'Monday', 1000000, 2000000],
        ['February', 'Friday', 2000000, 5000000],
        ['February', 'Sunday', 5000000, 10000000],
        ['March', 'Tuesday', 4000000, 8000000])

df = pd.DataFrame(data, columns=['Month', 'Day_Week', 'Income_A', 'Income_B'])
df = df.set_index(['Month', 'Day_Week'])

print(df.sum(axis=1))
Result:

Month     Day_Week
January   Monday         40000
          Friday         25000
February  Monday       3000000
          Friday       7000000
          Sunday      15000000
March     Tuesday     12000000
dtype: int64

In our example, this allows us to sum the income A and B for each row.

Multi Level Index Sum

If your dataframe has a multi-level index, you can tell pandas which index you want to sum across.

Our example dataframe contains 2 levels. To sum according to the first level, you can use this:

import pandas as pd

data = (['January', 'Monday', 10000, 30000],
        ['January', 'Friday', 5000, 20000],
        ['February', 'Monday', 1000000, 2000000],
        ['February', 'Friday', 2000000, 5000000],
        ['February', 'Sunday', 5000000, 10000000],
        ['March', 'Tuesday', 4000000, 8000000])
df = pd.DataFrame(data, columns=['Month', 'Day_Week', 'Income_A', 'Income_B'])
df = df.set_index(['Month', 'Day_Week'])

print(df.sum(level=0))
Result:

          Income_A  Income_B
Month                       
January      15000     50000
February   8000000  17000000
March      4000000   8000000

To sum from the second level, you can do this:

# Multi Level Index Sum

df.sum(level=1)
Result:

          Income_A  Income_B
Day_Week                    
Monday     1010000   2030000
Friday     2005000   5020000
Sunday     5000000  10000000
Tuesday    4000000   8000000

Summing a Series

You can also use the pandas sum() function on a series :

#Summing a Series

df['Income_A'].sum()
Result : 

12015000

Conclusion

In this tutorial, we have how to simply use the sum() function of the pandas library. This function is very useful to quickly analyze the data and make quick calculations on the columns or rows of our dataframe.

If you have any questions about its use, don’t hesitate to ask me in comments, I’ll be happy to answer them.

See you soon for new tutorials.

Back to Python Menu

Published
Categorized as Python

By ayed_amira

I'm a data scientist. Passionate about new technologies and programming I created this website mainly for people who want to learn more about data science and programming :)

1 comment

Leave a comment

Your email address will not be published. Required fields are marked *