Search code examples
pandasdataframenumpyfor-loopmean

How to calculate the mean of consecutive datapoints in a column of data without looping


I tried to calculate the average/mean of consecutive data points (ith and i+1th entries) in a column of data using For loop through the column indices, unfortunately, I was stuck and the only for the loop to be successful is by subtracting the last index. Question is, is there any Pandas or Numpy way to calculate this average through the entire index without the For loop (takes time!) and without necessarily subtracting the last index?

Here's my attempt so far, using an extract from my larger dataset:

import pandas as pd
df = pd.read_csv('class_data.dat',sep='\t')
df
    Age     BMI     Gender
0   23.0    17.2    Male
1   25.6    16.3    Female
2   26.4    22.5    Female
3   43.2    33.0    Male
4   22.5    21.8    Male
5   19.4    29.6    Male
6   20.5    34.6    Female
7   22.7    27.2    Female
8   17.5    15.5    Male
BMI_means = {}
for i in range(len(df)-1):    
    BMI_means[i] = (df.BMI[i] + df.BMI[i+1])/2
    
BMI_means

output:

{0: 16.75,
 1: 19.4,
 2: 27.75,
 3: 27.4,
 4: 25.700000000000003,
 5: 32.1,
 6: 30.9,
 7: 21.35}

Solution

  • here is one way :

    df['AVG'] = (df['BMI'] + df['BMI'].shift(-1)).div(2)
    

    output:

        Age   BMI  Gender    AVG
    0  23.0  17.2    Male  16.75
    1  25.6  16.3  Female  19.40
    2  26.4  22.5  Female  27.75
    3  43.2  33.0    Male  27.40
    4  22.5  21.8    Male  25.70
    5  19.4  29.6    Male  32.10
    6  20.5  34.6  Female  30.90
    7  22.7  27.2  Female  21.35
    8  17.5  15.5    Male    NaN