Search code examples
pandastime-seriescoordinatescorrelationpairing

Correlating movement patterns of IDs in time


I have some x-, y- coordinates per ID and used the groupby('ID').diff() functions to subtract the differences between x and y coordinates in order to identify directions patterns / individual ID. If the directions (xx and yy) are around 0, then the ID didn't move. Now, how can I find the correlations between the different IDs and their directions? Ideally, I would like to pair the IDs that move towards each other, and the IDs that sit "silent". Any help is deeply appreciated!!!

ID      Time                    X   Y   xx  yy
42403   2019-07-24 08:00:00.255 225 235 1.0 1.0
42386   2019-07-24 08:00:00.255 257 232 -1.0 0.0
42403   2019-07-24 08:00:00.495 226 235 1.0 0.0
42386   2019-07-24 08:00:00.495 257 232 0.0 0.0
42403   2019-07-24 08:00:00.733 226 235 0.0 0.0
42386   2019-07-24 08:00:00.733 257 232 0.0 0.0
42403   2019-07-24 08:00:00.008 224 234 -2.0 -1.0
42386   2019-07-24 08:00:00.008 258 232 1.0 0.0
42403   2019-07-24 08:00:00.255 225 235 1.0 1.0
42386   2019-07-24 08:00:00.255 257 232 -1.0 0.0
42403   2019-07-24 08:00:00.495 226 235 1.0 0.0
42386   2019-07-24 08:00:00.495 257 232 0.0 0.0
42403   2019-07-24 08:00:00.733 226 235 0.0 0.0
42386   2019-07-24 08:00:00.733 257 232 0.0 0.0
42403   2019-07-24 08:00:01.009 224 235 -2.0 0.0
42386   2019-07-24 08:00:01.009 258 232 1.0 0.0
42403   2019-07-24 08:00:01.371 225 235 1.0 0.0
42386   2019-07-24 08:00:01.371 259 232 1.0 0.0
42403   2019-07-24 08:00:01.611 226 235 1.0 0.0
42386   2019-07-24 08:00:01.611 258 232 -1.0 0.0
42403   2019-07-24 08:00:01.736 226 235 0.0 0.0
42386   2019-07-24 08:00:01.736 258 232 0.0 0.0
42403   2019-07-24 08:00:02.066 226 235 0.0 0.0
42386   2019-07-24 08:00:02.066 259 232 1.0 0.0
42403   2019-07-24 08:00:02.281 226 234 0.0 -1.0
42386   2019-07-24 08:00:02.281 259 232 0.0 0.0
42403   2019-07-24 08:00:02.568 226 234 0.0 0.0
42386   2019-07-24 08:00:02.568 259 232 0.0 0.0
42403   2019-07-24 08:00:02.769 225 234 -1.0 0.0
42386   2019-07-24 08:00:02.769 259 232 0.0 0.0
42403   2019-07-24 08:00:03.010 225 234 0.0 0.0
42386   2019-07-24 08:00:03.010 259 232 0.0 0.0
42403   2019-07-24 08:00:03.242 225 233 0.0 -1.0
42386   2019-07-24 08:00:03.242 259 232 0.0 0.0
42403   2019-07-24 08:00:03.574 225 235 0.0 2.0
42386   2019-07-24 08:00:03.574 259 232 0.0 0.0
42403   2019-07-24 08:00:03.760 224 235 -1.0 0.0
42386   2019-07-24 08:00:03.760 259 231 0.0 -1.0
42403   2019-07-24 08:00:03.971 224 234 0.0 -1.0
42386   2019-07-24 08:00:03.971 259 232 0.0 1.0
42403   2019-07-24 08:00:04.231 224 234 0.0 0.0
42386   2019-07-24 08:00:04.231 259 232 0.0 0.0
42403   2019-07-24 08:00:04.567 224 234 0.0 0.0
42386   2019-07-24 08:00:04.567 259 232 0.0 0.0
42403   2019-07-24 08:00:04.849 223 234 -1.0 0.0
42386   2019-07-24 08:00:04.849 259 232 0.0 0.0
42403   2019-07-24 08:00:05.054 223 234 0.0 0.0
42386   2019-07-24 08:00:05.054 259 232 0.0 0.0
42403   2019-07-24 08:00:05.288 224 235 1.0 1.0
42386   2019-07-24 08:00:05.288 259 232 0.0 0.0
42403   2019-07-24 08:00:05.597 225 234 1.0 -1.0
42386   2019-07-24 08:00:05.597 259 232 0.0 0.0
42403   2019-07-24 08:00:05.783 222 232 -3.0 -2.0
42386   2019-07-24 08:00:05.783 259 232 0.0 0.0
42403   2019-07-24 08:00:06.014 222 233 0.0 1.0

Solution

  • I would first get the values over which you want to compute your correlations in columns and then use the already implemented method of pandas .corr(). So for it an unstack method is an easy approach.

    Here you have my code:

    # considering that the position measures are already sorted we will create indexes for every step and call them 'measure'
    df['measure'] = [_ // 2 for _ in range(len(df))] # 2 IDs in total
    # no more need of time data
    df.drop(labels='Time', axis=1, inplace=True)
    
    # this line of work does all the work. I am using the .corr() method from pandas
    df.set_index(['ID', 'measure']).unstack('ID').corr()
    

    The output should be a correlations matrix, I used a fancy method to produce this heatmap using seaborn:

    enter image description here

    If you are interested on the visualisation as well, take a look to the heatmap method from seaborn.