Pandas mean(): Calculate the average in a Pandas Dataframe

Pandas mean(): In this tutorial, we will see how to calculate the average of a requested axis of a pandas dataframe
Introduction
A pandas dataframe is a two-dimensional tabular data structure that can be modified in size with labeled axes that are commonly referred to as row and column labels, with different arithmetic operations aligned with the row and column labels.
The Pandas library, available on python, allows to import data and to make quick analysis on loaded data.
The mean() function uses the most commonly used mathematical formula: the sum of all terms divided by the total number of terms. This function can be applied to a pandas dataframe or a series.
If the function is applied to a Series, it returns a scalar value that is the average value of all observations in the dataframe. If the method is applied to a dataframe object, it returns a pandas series object that contains the average of the values on the specified axis.
Mean is also included within Pandas Describe.
In this tutorial, we will discuss the following points:
- Mean across columns
- Mean across rows
- Mean with a Series
- Skipping NAs values
To illustrate these different points, we will use the following Pandas dataframe:
import pandas as pd
import numpy as np
NaN = np.nan
company = [(1100, 10000, 40, 200),
(1200, 5000, NaN, NaN),
(500, 600, 60, 500),
(3000, 3000, NaN, 700),
(5000, 4000, 50, 700)
]
df = pd.DataFrame(company, columns=['INCOME1', 'INCOME2', 'INCOME3', 'INCOME4'])
print(df)
Output:
INCOME1 INCOME2 INCOME3 INCOME4
0 1100 10000 40.0 200.0
1 1200 5000 NaN NaN
2 500 600 60.0 500.0
3 3000 3000 NaN 700.0
4 5000 4000 50.0 700.0
Pandas mean()
Pandas mean() – Syntax and parameters
Syntax
# Syntax
DataFrame.mean(axis=None,
skipna=None,
level=None,
numeric_only=None,
**kwargs)
It returns the mean of the Series or DataFrame if the level is specified.
Parameters
The mean() function can take 5 parameters:
Name | Description | Type | Default Value | Required |
---|---|---|---|---|
axis | This parameter allows to take the average across columns (axis=0 or ‘index’) or rows (axis=1 or ‘columns’) | int or str | None | No |
skipna | It excludes all the null values when computing result. | boolean | None | No |
level | If you have a multi index, then you can pass the name (or int) of your level to compute the mean. | int or str | None | No |
numeric_only | It only includes int, float, boolean columns. If None, it will try to use everything, then use only the numeric data. This feature is not implemented for series. | boolean | None | No |
**kwargs | Additional keyword arguments to be passed to the function. | – | – | No |
Pandas Mean across columns
If you want to calculate the average of each column, you don’t have to specify the axis because by default the average will be calculated on the columns:
# Mean across columns
print(df.mean())
# equal to df.mean(axis=0) or df.mean(axis='index')
Output:
INCOME1 2160.0
INCOME2 4520.0
INCOME3 50.0
INCOME4 525.0
dtype: float64
Pandas Mean across columns
To calculate the average of the values for each row, you must specify the axis (1 or ‘columns’):
# Mean across rows
print(df.mean(axis=1))
# Equal to
print(df.mean(axis='columns'))
Output:
0 2835.000000
1 3100.000000
2 415.000000
3 2233.333333
4 2437.500000
dtype: float64
Pandas Mean with a Series
If you want to calculate the average on a given column, the mean() function is implemented on the Series pandas:
# Mean with a Series pandas
print(df["INCOME1"].mean())
Output:
2160.0
Skipping NAs values
By default, the mean() function ignores null values when calculating the average thanks to the skipna parameter. In some cases, it may be useful to have the function return NaN if the column or row contains a null value.
Here is how to do it:
# Ignores null values
print(df.mean(skipna=False))
Output:
INCOME1 2160.0
INCOME2 4520.0
INCOME3 NaN
INCOME4 NaN
dtype: float64
We can see that the columns INCOME3 and INCOME4 contain null values because the return value is NaN.
Conclusion
In this tutorial we have seen how to calculate the average on a requested axis. This kind of function is very useful to understand, analyze or describe data.
I hope you found it interesting and don’t hesitate to ask me questions in the comments if you need help on how to use this function.
See you soon for new articles!
Comments
Leave a comment