Pandas Sum() – Sum each Column and Row in Pandas DataFrame

Pandas sum(): We will see in this tutorial how to use the sum() function for a column or row in a Pandas dataframe.
Introduction
A pandas dataframe is a two-dimensional tabular data structure that can be modified in size with labeled axes that are commonly referred to as row and column labels, with different arithmetic operations aligned with the row and column labels.
The Pandas library, available on python, allows to import data and to make quick analysis on loaded data.
In this tutorial, we will see how to use the sum() function present in the pandas library. This pandas function allows to return the sum of the values according to the axis requested in parameter. We will see the following points:
- Use the sum() function to sum the values on the index axis (the rows)
- Use the sum() function to sum the values on the columns axis
- Sum the values with a multi-level index
- Sum the values on a Series type
To illustrate these different points, we will use the following pandas dataframe:
import pandas as pd
data = (['January', 'Monday', 10000, 30000],
['January', 'Friday', 5000, 20000],
['February', 'Monday', 1000000, 2000000],
['February', 'Friday', 2000000, 5000000],
['February', 'Sunday', 5000000, 10000000],
['March', 'Tuesday', 4000000, 8000000])
df = pd.DataFrame(data, columns=['Month', 'Day_Week', 'Income_A', 'Income_B'])
df = df.set_index(['Month', 'Day_Week'])
Income_A Income_B
Month Day_Week
January Monday 10000 30000
Friday 5000 20000
February Monday 1000000 2000000
Friday 2000000 5000000
Sunday 5000000 10000000
March Tuesday 4000000 8000000
This dataframe contains the different incomes generated per month and per day.
Pandas Dataframe sum() function
Pandas sum() Syntax
The sum() function is used to sum the values on a given axis. Its syntax is the following:
# Sum() function
DataFrame.sum(axis = None, skipna = None, level = None, numeric_only = None, min_count = 0, ** kwargs)
The function can take 6 parameters:
Name | Description | Type | Default Value | Required |
---|---|---|---|---|
axis | The axis to apply the function ( 0=index,1=columns) | {index (0), columns (1)} | – | Yes |
skipna | Exclude NA / NULL values | {index (0), columns (1)} | True | No |
level | If the axis is a MultiIndex (hierarchical), count along a particular level, reducing to a series. | int or level name | None | No |
numeric_only | Include only float, int, boolean columns. If none, will try to use everything, then use only numeric data. Not implemented for the series. | Boolean | True | No |
min_count | The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA. | int | 0 | No |
** kwargs | Additional arguments to be passed to the function. | – | – | No |
Sum each Column in Pandas DataFrame
In order to sum each column of the DataFrame, you can use the axis parameter in this way:
# Sum each column
df.sum(axis=0)
You can apply this code to our previously created dataframe:
import pandas as pd
data = (['amiradata', 'Monday', 10000, 30000],
['amiradata', 'Friday', 5000, 20000],
['google', 'Monday', 1000000, 2000000],
['google', 'Friday', 2000000, 5000000],
['google', 'Sunday', 5000000, 10000000],
['linkedin', 'Tuesday', 4000000, 8000000])
df = pd.DataFrame(data, columns=['Website', 'Day', 'Nb_Users', 'Nb_Pageviews'])
df = df.set_index(['Website', 'Day'])
print(df.sum(axis=0))
Result :
Income_A 12015000
Income_B 25050000
dtype: int64
We obtain the sum of the income A and the sum of the income B on the last quarter.
Sum each Row in Pandas DataFrame
In order to sum each row of the DataFrame, you can use the axis=1 as follows:
# Sum each row
df.sum(axis=1)
You can apply this code to our previously created dataframe:
import pandas as pd
data = (['January', 'Monday', 10000, 30000],
['January', 'Friday', 5000, 20000],
['February', 'Monday', 1000000, 2000000],
['February', 'Friday', 2000000, 5000000],
['February', 'Sunday', 5000000, 10000000],
['March', 'Tuesday', 4000000, 8000000])
df = pd.DataFrame(data, columns=['Month', 'Day_Week', 'Income_A', 'Income_B'])
df = df.set_index(['Month', 'Day_Week'])
print(df.sum(axis=1))
Result:
Month Day_Week
January Monday 40000
Friday 25000
February Monday 3000000
Friday 7000000
Sunday 15000000
March Tuesday 12000000
dtype: int64
In our example, this allows us to sum the income A and B for each row.
Multi Level Index Sum
If your dataframe has a multi-level index, you can tell pandas which index you want to sum across.
Our example dataframe contains 2 levels. To sum according to the first level, you can use this:
import pandas as pd
data = (['January', 'Monday', 10000, 30000],
['January', 'Friday', 5000, 20000],
['February', 'Monday', 1000000, 2000000],
['February', 'Friday', 2000000, 5000000],
['February', 'Sunday', 5000000, 10000000],
['March', 'Tuesday', 4000000, 8000000])
df = pd.DataFrame(data, columns=['Month', 'Day_Week', 'Income_A', 'Income_B'])
df = df.set_index(['Month', 'Day_Week'])
print(df.sum(level=0))
Result:
Income_A Income_B
Month
January 15000 50000
February 8000000 17000000
March 4000000 8000000
To sum from the second level, you can do this:
# Multi Level Index Sum
df.sum(level=1)
Result:
Income_A Income_B
Day_Week
Monday 1010000 2030000
Friday 2005000 5020000
Sunday 5000000 10000000
Tuesday 4000000 8000000
Summing a Series
You can also use the pandas sum() function on a series :
#Summing a Series
df['Income_A'].sum()
Result :
12015000
Conclusion
In this tutorial, we have how to simply use the sum() function of the pandas library. This function is very useful to quickly analyze the data and make quick calculations on the columns or rows of our dataframe.
If you have any questions about its use, don’t hesitate to ask me in comments, I’ll be happy to answer them.
See you soon for new tutorials.
Comments
Leave a comment