Search code examples
pythonpython-3.xpandasdataframepandas-groupby

Adding new value through groupby operation using for cycle


I need to add a column with changes ow worker coordinates through different stages. We have a DataFrame:

import pandas as pd
from geopy.distance  import geodesic as GD

d = {'user_id': [26, 26, 26, 26, 26, 26, 9, 9, 9, 9],
            'worker_latitude': [55.114410, 55.114459, 55.114379, 
55.114462, 55.114372, 55.114389, 65.774064, 65.731034, 65.731034, 65.774057], 
            'worker_longitude': [38.927155, 38.927114, 38.927101, 38.927156,
 38.927258, 38.927120, 37.532380, 37.611746, 37.611746, 37.532346],
    'change':[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}

df = pd.DataFrame(data=d)

which looks like:

   user_id  worker_latitude  worker_longitude  change
0       26        55.114410         38.927155       0
1       26        55.114459         38.927114       0
2       26        55.114379         38.927101       0
3       26        55.114462         38.927156       0
4       26        55.114372         38.927258       0
5       26        55.114389         38.927120       0
6        9        65.774064         37.532380       0
7        9        65.731034         37.611746       0
8        9        65.731034         37.611746       0
9        9        65.774057         37.532346       0

Then I need to count difference between person previous and current stage. So I use a function:

for group in df.groupby(by='user_id'):
    group[1].reset_index(inplace=True,drop=True)
    for i in range(1,len(group[1])):
        first_xy=(group[1]['worker_latitude'][i-1],group[1]['worker_longitude'][i-1])
        second_xy=(group[1]['worker_latitude'][i],group[1]['worker_longitude'][i])
        print((round((GD(first_xy, second_xy).km),6)))
        group[1]['change'][i]=round((GD(first_xy, second_xy).km),6)

And then I get:

6.021576
0.0
6.021896
0.00605
0.008945
0.009884
0.011948
0.009007
display(df)
   user_id  worker_latitude  worker_longitude  change
0       26        55.114410         38.927155       0
1       26        55.114459         38.927114       0
2       26        55.114379         38.927101       0
3       26        55.114462         38.927156       0
4       26        55.114372         38.927258       0
5       26        55.114389         38.927120       0
6        9        65.774064         37.532380       0
7        9        65.731034         37.611746       0
8        9        65.731034         37.611746       0
9        9        65.774057         37.532346       0

Which means that values are counted correctly, but for some reason they don't fit into 'change' column. What can be done?


Solution

  • It doesn't works because you're accessing a copy of your DataFrame and trying to assign value to it.

    However, it seems instead of iterating over the DataFrame inside groupby, it seems more intuitive to use groupby + shift to get the first_xys first; then apply a custom function that applies GD between first_xy and second_xy to each row:

    def func(x):
        if x.notna().all():
            first_xy = (x['prev_lat'], x['prev_long'])
            second_xy = (x['worker_latitude'], x['worker_longitude'])
            return round((GD(first_xy, second_xy).km), 6)
        else:
            return float('nan')
    
    g = df.groupby('user_id')
    df['prev_lat'] = g['worker_latitude'].shift()
    df['prev_long'] = g['worker_longitude'].shift()
    df['change'] = df.apply(func, axis=1)
    df = df.drop(columns=['prev_lat','prev_long'])
    

    Output:

       user_id  worker_latitude  worker_longitude    change
    0       26        55.114410         38.927155       NaN
    1       26        55.114459         38.927114  0.006050
    2       26        55.114379         38.927101  0.008945
    3       26        55.114462         38.927156  0.009884
    4       26        55.114372         38.927258  0.011948
    5       26        55.114389         38.927120  0.009007
    6        9        65.774064         37.532380       NaN
    7        9        65.731034         37.611746  6.021576
    8        9        65.731034         37.611746  0.000000
    9        9        65.774057         37.532346  6.021896