python pandas position coordinates pandas-groupby

Calculate dynamic centroids for each X row in pandas

The following code gives me a general centroid of all of the readings of my df.

pos = df4[['x', 'y']].to_numpy() # gives me all the x-,y-coordinates in df4

def centroid(arr):
    length = arr.shape[0]
    sum_x = np.sum(arr[:, 0])
    sum_y = np.sum(arr[:, 1])
    return sum_x/length, sum_y/length

coll_cps = np.array(centroid(pos)) # Create centroids between ids

How can I create a new column with temporary centroids of each person ID for, lets say, every 10th reading?

My df looks like this:

          x    y    id   time
0       162  282  2700      0
1       162  282  2819      0
2       162  282  2820      0
3       449  235  2700      1
4       449  235  2820      1
5       449  235  2819      1
6       457  293  2819      2
7       457  293  2820      2
8       457  293  2700      2
9       164  283  2700      3
10      164  283  2819      3
11      164  283  2820      3
12      457  293  2700      4
13      457  293  2820      4
14      457  293  2819      4
15      450  235  2700      5
16      450  235  2820      5
17      450  235  2819      5
18      449  234  2700      6
19      449  234  2819      6
20      449  234  2820      6
21      456  293  2820      7
22      456  293  2819      7
23      456  293  2700      7
24      167  277  2820      8
25      167  277  2700      8
26      167  277  2819      8
27      167  277  2820      9
28      167  277  2700      9
29      167  277  2819      9
...  ...   ...    ...

The output should be a new column with the temporary centroids between the id's within x rows, 10 for instance. So the average centroid per 10 readings at a time.

So, for 10 rows at a time, append the average centroid for each id.

Solution

Pop in a helper column to identify group and then use the groupby and apply pattern as such:

import pandas as pd

# some data
x_vals = [1, 2, 3, 10, 11, 20]
y_vals = [2, 4, 6, 0, 10, 0]

data = {'x': x_vals, 'y': y_vals}

df = pd.DataFrame(data)

group_size = 3

# make "helper row" with group number
df['group'] = df.index//group_size



def centroid(row):
    return (row.x.mean(), row.y.mean())

df_centroids = df.groupby('group').apply(centroid)

print(df)
print()
print(df_centroids)

Yields:

    x   y  group
0   1   2      0
1   2   4      0
2   3   6      0
3  10   0      1
4  11  10      1
5  20   0      1

group
0                                  (2.0, 4.0)
1    (13.666666666666666, 3.3333333333333335)