Search code examples
pythonpandasdataframemean

calculate the average of each user in python


I have a data frame (df) with these columns: user, values, and time.

df = pd.DataFrame({'user': ['user_1', 'user_2', 'user_3','user_1', 'user_2',  'user_3',
                            'user_1', 'user_2',  'user_3'],
                   'values': [[1, 0, 2, 0], [1, 8, 0, 2],[6, 2, 0, 0],
                              [5, 0, 2, 2], [3, 8, 0, 0],[6, 0, 0, 2],
                             [3, 1, 1, 3], [2, 4, 1, 0],[4, 2, 0, 1]],
                   'time': [1, 1, 1, 2, 2, 2, 3, 3, 3]})

output:

    user       values     time
0   user_1  [1, 0, 2, 0]    1
1   user_2  [1, 8, 0, 2]    1
2   user_3  [6, 2, 0, 0]    1
3   user_1  [5, 0, 2, 2]    2
4   user_2  [3, 8, 0, 0]    2
5   user_3  [6, 0, 0, 2]    2
6   user_1  [3, 1, 1, 3]    3
7   user_2  [2, 4, 1, 0]    3
8   user_3  [4, 2, 0, 1]    3

I would like to calculate the average value of each user with respect to the previous value of time.

For example:

For user_1:-

For time=1 the average is calculated for values of user_1 in time 1= [1, 0, 2, 0]

For time=2, the average is calculated for values of user_1 in time 1 and 2
= [1+5/2, 0+0/2, 2+2/2, 0+2/2]=[3, 0, 2, 1]

For time=3 the average is calculated for values of user_1 in time 2 (which is [3, 0, 2, 1]) and 3 (which is [3, 1, 1, 3])= [3+3/2, 0+1/2, 2+1/2, 1+3/2]=[3, 0.5, 1.5, 2]

and so on for all users.

So, the expected result for user_1 is:

in time 1: [1, 0, 2, 0]
in time 2: [3, 0, 2, 1]
in time 3: [3, 0.5, 1.5, 2]

I tried the following code which

result = (df.groupby('user')['values']
         .agg(lambda x: np.vstack(x).mean(0).round(2))
       )

print(result)

result

user
user_1    [3.0, 0.33, 1.67, 1.67]
user_2    [2.0, 6.67, 0.33, 0.67]
user_3     [5.33, 1.33, 0.0, 1.0]

but it returns the average value for each user with respect to all time! I would like to calculate the average value of each user with respect to current t and previous t-1, like in example.


Solution

  • You can do with for loop

    all = []
    for x , y in  df.groupby('user'): 
        l = [np.array(y['values'].iloc[0])]
        for z in y['values'].iloc[1:] :
            m = np.array(z)
            l.append((m + l[-1])/2)
        all.append(l)
    all
    Out[442]: 
    [[array([1, 0, 2, 0]), array([3., 0., 2., 1.]), array([3. , 0.5, 1.5, 2. ])],
     [array([1, 8, 0, 2]), array([2., 8., 0., 1.]), array([2. , 6. , 0.5, 0.5])],
     [array([6, 2, 0, 0]), array([6., 1., 0., 1.]), array([5. , 1.5, 0. , 1. ])]]