Pandas mean(): Calculate the average in a Pandas Dataframe

pandas mean

Pandas mean(): In this tutorial, we will see how to calculate the average of a requested axis of a pandas dataframe

Introduction

A pandas dataframe is a two-dimensional tabular data structure that can be modified in size with labeled axes that are commonly referred to as row and column labels, with different arithmetic operations aligned with the row and column labels.

The Pandas library, available on python, allows to import data and to make quick analysis on loaded data.

The mean() function uses the most commonly used mathematical formula: the sum of all terms divided by the total number of terms. This function can be applied to a pandas dataframe or a series.

If the function is applied to a Series, it returns a scalar value that is the average value of all observations in the dataframe. If the method is applied to a dataframe object, it returns a pandas series object that contains the average of the values on the specified axis.

Mean is also included within Pandas Describe.

In this tutorial, we will discuss the following points:

  • Mean across columns
  • Mean across rows
  • Mean with a Series
  • Skipping NAs values

To illustrate these different points, we will use the following Pandas dataframe:

import pandas as pd
import numpy as np

NaN = np.nan

company = [(1100, 10000, 40, 200),
           (1200, 5000, NaN, NaN),
           (500, 600, 60, 500),
           (3000, 3000, NaN, 700),
           (5000, 4000, 50, 700)
           ]

df = pd.DataFrame(company, columns=['INCOME1', 'INCOME2', 'INCOME3', 'INCOME4'])

print(df)

Output:

   INCOME1  INCOME2  INCOME3  INCOME4
0     1100    10000     40.0    200.0
1     1200     5000      NaN      NaN
2      500      600     60.0    500.0
3     3000     3000      NaN    700.0
4     5000     4000     50.0    700.0

Pandas mean()

Pandas mean() – Syntax and parameters

Syntax

# Syntax 

DataFrame.mean(axis=None, 
skipna=None, 
level=None,
numeric_only=None,
**kwargs)

It returns the mean of the Series or DataFrame if the level is specified.

Parameters

The mean() function can take 5 parameters:

NameDescriptionTypeDefault ValueRequired
axis This parameter allows to take the average across columns (axis=0 or ‘index’) or rows (axis=1 or ‘columns’)int or strNoneNo
skipna It excludes all the null values when computing result.booleanNoneNo
levelIf you have a multi index, then you can pass the name (or int) of your level to compute the mean.int or strNoneNo
numeric_onlyIt only includes int, float, boolean columns. If None, it will try to use everything, then use only the numeric data. This feature is not implemented for series.booleanNoneNo
**kwargsAdditional keyword arguments to be passed to the function.No

Pandas Mean across columns

If you want to calculate the average of each column, you don’t have to specify the axis because by default the average will be calculated on the columns:

# Mean across columns 

print(df.mean())

# equal to df.mean(axis=0) or df.mean(axis='index')

Output:

INCOME1    2160.0
INCOME2    4520.0
INCOME3      50.0
INCOME4     525.0
dtype: float64

Pandas Mean across columns

To calculate the average of the values for each row, you must specify the axis (1 or ‘columns’):

# Mean across rows

print(df.mean(axis=1))
# Equal to 
print(df.mean(axis='columns'))

Output:

0    2835.000000
1    3100.000000
2     415.000000
3    2233.333333
4    2437.500000
dtype: float64

Pandas Mean with a Series

If you want to calculate the average on a given column, the mean() function is implemented on the Series pandas:

# Mean with a Series pandas 

print(df["INCOME1"].mean())

Output:

2160.0

Skipping NAs values

By default, the mean() function ignores null values when calculating the average thanks to the skipna parameter. In some cases, it may be useful to have the function return NaN if the column or row contains a null value.

Here is how to do it:

# Ignores null values

print(df.mean(skipna=False))

Output:

INCOME1    2160.0
INCOME2    4520.0
INCOME3       NaN
INCOME4       NaN
dtype: float64

We can see that the columns INCOME3 and INCOME4 contain null values because the return value is NaN.

Conclusion

In this tutorial we have seen how to calculate the average on a requested axis. This kind of function is very useful to understand, analyze or describe data.

I hope you found it interesting and don’t hesitate to ask me questions in the comments if you need help on how to use this function.

See you soon for new articles!

Back to Python Menu

Published
Categorized as Python

By ayed_amira

I'm a data scientist. Passionate about new technologies and programming I created this website mainly for people who want to learn more about data science and programming :)

Leave a comment

Your email address will not be published.