Search code examples
pythonpandaspositioncoordinatespandas-groupby

Calculate dynamic centroids for each X row in pandas


The following code gives me a general centroid of all of the readings of my df.

pos = df4[['x', 'y']].to_numpy() # gives me all the x-,y-coordinates in df4

def centroid(arr):
    length = arr.shape[0]
    sum_x = np.sum(arr[:, 0])
    sum_y = np.sum(arr[:, 1])
    return sum_x/length, sum_y/length

coll_cps = np.array(centroid(pos)) # Create centroids between ids   

How can I create a new column with temporary centroids of each person ID for, lets say, every 10th reading?

My df looks like this:

          x    y    id   time
0       162  282  2700      0
1       162  282  2819      0
2       162  282  2820      0
3       449  235  2700      1
4       449  235  2820      1
5       449  235  2819      1
6       457  293  2819      2
7       457  293  2820      2
8       457  293  2700      2
9       164  283  2700      3
10      164  283  2819      3
11      164  283  2820      3
12      457  293  2700      4
13      457  293  2820      4
14      457  293  2819      4
15      450  235  2700      5
16      450  235  2820      5
17      450  235  2819      5
18      449  234  2700      6
19      449  234  2819      6
20      449  234  2820      6
21      456  293  2820      7
22      456  293  2819      7
23      456  293  2700      7
24      167  277  2820      8
25      167  277  2700      8
26      167  277  2819      8
27      167  277  2820      9
28      167  277  2700      9
29      167  277  2819      9
...  ...   ...    ...

The output should be a new column with the temporary centroids between the id's within x rows, 10 for instance. So the average centroid per 10 readings at a time.

So, for 10 rows at a time, append the average centroid for each id.


Solution

  • Pop in a helper column to identify group and then use the groupby and apply pattern as such:

    import pandas as pd
    
    # some data
    x_vals = [1, 2, 3, 10, 11, 20]
    y_vals = [2, 4, 6, 0, 10, 0]
    
    data = {'x': x_vals, 'y': y_vals}
    
    df = pd.DataFrame(data)
    
    group_size = 3
    
    # make "helper row" with group number
    df['group'] = df.index//group_size
    
    
    
    def centroid(row):
        return (row.x.mean(), row.y.mean())
    
    df_centroids = df.groupby('group').apply(centroid)
    
    print(df)
    print()
    print(df_centroids)
    

    Yields:

        x   y  group
    0   1   2      0
    1   2   4      0
    2   3   6      0
    3  10   0      1
    4  11  10      1
    5  20   0      1
    
    group
    0                                  (2.0, 4.0)
    1    (13.666666666666666, 3.3333333333333335)