Python

Pandas mean(): Calculate the average in a Pandas Dataframe

By ayed_amira , on 08/17/2021 , updated on 08/18/2021 - 4 minutes to read

Pandas mean(): In this tutorial, we will see how to calculate the average of a requested axis of a pandas dataframe

Introduction

A pandas dataframe is a two-dimensional tabular data structure that can be modified in size with labeled axes that are commonly referred to as row and column labels, with different arithmetic operations aligned with the row and column labels.

The Pandas library, available on python, allows to import data and to make quick analysis on loaded data.

The mean() function uses the most commonly used mathematical formula: the sum of all terms divided by the total number of terms. This function can be applied to a pandas dataframe or a series.

If the function is applied to a Series, it returns a scalar value that is the average value of all observations in the dataframe. If the method is applied to a dataframe object, it returns a pandas series object that contains the average of the values on the specified axis.

Mean is also included within.

In this tutorial, we will discuss the following points:

• Mean across columns
• Mean across rows
• Mean with a Series
• Skipping NAs values

To illustrate these different points, we will use the following Pandas dataframe:

```import pandas as pd
import numpy as np

NaN = np.nan

company = [(1100, 10000, 40, 200),
(1200, 5000, NaN, NaN),
(500, 600, 60, 500),
(3000, 3000, NaN, 700),
(5000, 4000, 50, 700)
]

df = pd.DataFrame(company, columns=['INCOME1', 'INCOME2', 'INCOME3', 'INCOME4'])

print(df)
```

Output:

```   INCOME1  INCOME2  INCOME3  INCOME4
0     1100    10000     40.0    200.0
1     1200     5000      NaN      NaN
2      500      600     60.0    500.0
3     3000     3000      NaN    700.0
4     5000     4000     50.0    700.0
```

Pandas mean()

Pandas mean() – Syntax and parameters

Syntax

```# Syntax

DataFrame.mean(axis=None,
skipna=None,
level=None,
numeric_only=None,
**kwargs)
```

It returns the mean of the Series or DataFrame if the level is specified.

Parameters

The mean() function can take 5 parameters:

Pandas Mean across columns

If you want to calculate the average of each column, you don’t have to specify the axis because by default the average will be calculated on the columns:

```# Mean across columns

print(df.mean())

# equal to df.mean(axis=0) or df.mean(axis='index')

```

Output:

```INCOME1    2160.0
INCOME2    4520.0
INCOME3      50.0
INCOME4     525.0
dtype: float64
```

Pandas Mean across columns

To calculate the average of the values for each row, you must specify the axis (1 or ‘columns’):

```# Mean across rows

print(df.mean(axis=1))
# Equal to
print(df.mean(axis='columns'))
```

Output:

```0    2835.000000
1    3100.000000
2     415.000000
3    2233.333333
4    2437.500000
dtype: float64
```

Pandas Mean with a Series

If you want to calculate the average on a given column, the mean() function is implemented on the Series pandas:

```# Mean with a Series pandas

print(df["INCOME1"].mean())
```

Output:

```2160.0
```

Skipping NAs values

By default, the mean() function ignores null values when calculating the average thanks to the skipna parameter. In some cases, it may be useful to have the function return NaN if the column or row contains a null value.

Here is how to do it:

```# Ignores null values

print(df.mean(skipna=False))

```

Output:

```INCOME1    2160.0
INCOME2    4520.0
INCOME3       NaN
INCOME4       NaN
dtype: float64
```

We can see that the columns INCOME3 and INCOME4 contain null values because the return value is NaN.

Conclusion

In this tutorial we have seen how to calculate the average on a requested axis. This kind of function is very useful to understand, analyze or describe data.

I hope you found it interesting and don’t hesitate to ask me questions in the comments if you need help on how to use this function.

See you soon for new articles!