Search code examples
rspatialmatching

Matching points and calculate total distance/accuracy


I have a list list.response with 309 dataframes. In each dataframe there are 10 rows and two columns. The columns are X and Y-coordinates and represents "clicks", which a survey respondent has made on picture.

Furthermore, I have another dataframe df.true with 10 XY-coordinates. These coordinates represents the coordinates of the objects, which the respondent tried to click on in the survey.

GOAL: For each respondent (i.e., each dataframe in list.response) I want to calculate how accurately they were, when trying to click on the objects. In other words: What is the distance between the coordinates of their 10 clicks, and the coordinates of the 10 objects in df.true.

My problem is that the coordinates of their clicks and coordinates of the objects are not in the same order. For instance, respondent A might have clicked on objects from left-right, whereas respondent B might have clicked on objects from right-left, which screws up the order of clicks and objects. Therefore, I need to match the respondents clicks with the nearest object. The criteria for matching are:

  • The spatial distance between a click and an object should be as small as possible.
  • One click can only be matched with one object and vice versa (i.e., if there is a click-object match, these should not be used in any other matches, even if it would be useful in terms of shortest distance).

Finally, I want to calculate the total distance between all the matched points (i.e, summarize the distance between all the matched). This will be my measurement for the respondents overall accuracy in clicking on the objects.


I have looked at several solutions to somewhat similar problems (see Working with spatial data: How to find the nearest neighbour of points without replacement? and https://gis.stackexchange.com/questions/297153/excluding-point-from-nearest-neighbor-search-once-its-been-matched-using-r), however I havn't been able to make it work in my case. Disclaimer: I'm new to R / programming

I hope someone is able to help me?


DATA FOR REPRODUCIBLE EXAMPLE:

Sample of 20 df's from the lists with clicks:

list.response <- list(structure(list(X = c(536, 160, 467, 552, 476, 242, 355, 
414, 556, 0), Y = c(91, 181, 128, 84, 52, 379, 434, 528, 551, 
0)), row.names = c(NA, -10L), class = "data.frame"), structure(list(
    X = c(536, 542, 455, 148, 70, 239, 369, 416, 553, 0), Y = c(91, 
    94, 110, 185, 98, 387, 427, 509, 554, 0)), row.names = c(NA, 
-10L), class = "data.frame"), structure(list(X = c(536, 160, 
232, 374, 425, 561, 461, 544, 473, 0), Y = c(91, 193, 380, 426, 
513, 559, 105, 97, 37, 0)), row.names = c(NA, -10L), class = "data.frame"), 
    structure(list(X = c(536, 156, 240, 375, 455, 476, 549, 414, 
    547, 0), Y = c(91, 194, 389, 425, 116, 37, 87, 494, 553, 
    0)), row.names = c(NA, -10L), class = "data.frame"), structure(list(
        X = c(536, 70, 455, 543, 482, 241, 368, 418, 551, 0), 
        Y = c(91, 99, 107, 93, 47, 385, 427, 511, 552, 0)), row.names = c(NA, 
    -10L), class = "data.frame"), structure(list(X = c(536, 480, 
    458, 81, 158, 231, 393, 409, 558, 0), Y = c(91, 35, 91, 114, 
    175, 385, 423, 508, 562, 0)), row.names = c(NA, -10L), class = "data.frame"), 
    structure(list(X = c(536, 67, 492, 460, 542, 240, 364, 407, 
    554, 0), Y = c(91, 98, 48, 108, 98, 391, 428, 507, 553, 0
    )), row.names = c(NA, -10L), class = "data.frame"), structure(list(
        X = c(536, 156, 240, 371, 409, 563, 449, 480, 547, 0), 
        Y = c(91, 194, 387, 414, 510, 549, 110, 44, 96, 0)), row.names = c(NA, 
    -10L), class = "data.frame"), structure(list(X = c(536, 485, 
    462, 419, 556, 371, 240, 156, 71, 0), Y = c(91, 50, 110, 
    499, 556, 423, 380, 183, 96, 0)), row.names = c(NA, -10L), class = "data.frame"), 
    structure(list(X = c(536, 423, 362, 76, 156, 243, 551, 480, 
    455, 0), Y = c(91, 505, 434, 103, 187, 386, 547, 50, 114, 
    0)), row.names = c(NA, -10L), class = "data.frame"), structure(list(
        X = c(536, 155, 245, 359, 414, 552, 456, 535, 483, 0), 
        Y = c(91, 185, 391, 423, 508, 544, 119, 92, 48, 0)), row.names = c(NA, 
    -10L), class = "data.frame"), structure(list(X = c(536, 419, 
    366, 242, 155, 76, 451, 538, 480, 0), Y = c(91, 510, 425, 
    393, 190, 103, 107, 96, 53, 0)), row.names = c(NA, -10L), class = "data.frame"), 
    structure(list(X = c(536, 412, 369, 243, 153, 76, 458, 481, 
    543, 0), Y = c(91, 512, 425, 386, 187, 100, 114, 48, 96, 
    0)), row.names = c(NA, -10L), class = "data.frame"), structure(list(
        X = c(536, 483, 457, 151, 73, 241, 368, 416, 552, 0), 
        Y = c(91, 45, 108, 186, 99, 386, 426, 507, 556, 0)), row.names = c(NA, 
    -10L), class = "data.frame"), structure(list(X = c(536, 151, 
    483, 455, 544, 239, 368, 418, 547, 0), Y = c(91, 182, 43, 
    104, 96, 388, 426, 508, 554, 0)), row.names = c(NA, -10L), class = "data.frame"), 
    structure(list(X = c(536, 418, 368, 238, 154, 73, 454, 482, 
    543, 0), Y = c(91, 510, 430, 387, 184, 100, 110, 48, 93, 
    0)), row.names = c(NA, -10L), class = "data.frame"), structure(list(
        X = c(536, 481, 455, 149, 70, 240, 369, 417, 555, 0), 
        Y = c(91, 46, 109, 184, 99, 386, 427, 509, 555, 0)), row.names = c(NA, 
    -10L), class = "data.frame"), structure(list(X = c(536, 456, 
    541, 148, 71, 244, 370, 418, 555, 0), Y = c(91, 110, 96, 
    186, 88, 389, 427, 511, 553, 0)), row.names = c(NA, -10L), class = "data.frame"), 
    structure(list(X = c(536, 454, 240, 151, 71, 541, 366, 416, 
    551, 0), Y = c(91, 108, 389, 183, 99, 92, 428, 510, 552, 
    0)), row.names = c(NA, -10L), class = "data.frame"), structure(list(
        X = c(536, 147, 476, 499, 553, 244, 385, 417, 557, 0), 
        Y = c(91, 185, 110, 38, 87, 397, 433, 506, 552, 0)), row.names = c(NA, 
    -10L), class = "data.frame"))

And the df.true coordinates:

df.true <- structure(list(X = c(71, 151, 240, 370, 415, 552, 542, 456, 482, 
0), Y = c(99, 186, 387, 429, 509, 553, 91, 108, 45, 0)), row.names = c(NA, 
-10L), class = "data.frame") 

Solution

  • I came up with a solution. First, I converted all dataframes to matrices. I then used the function from here: https://gis.stackexchange.com/questions/297153/excluding-point-from-nearest-neighbor-search-once-its-been-matched-using-r:

    pairup <- function(list1, list2){
      keep = 1:nrow(list2)
      used = c()
      for(i in 1:nrow(list1)){
        nearest = FNN::get.knnx(list2, list1[i,,drop=FALSE], 1)$nn.index[1,1]        
        used = c(used, keep[nearest])
        keep = keep[-nearest]
        list2 = list2[-nearest,,drop=FALSE]
      }
      used
    }
    

    And then I ran the for loop:

    #Define an empty vector 
    pm <- c()
    
    #Run loop and calculate distance
    for (i in 1:length(list.response)) {
      match <- pairup(list.response[[i]],df.true)
      pm[i]<-sum(pointDistance(list.response[[i]], df.true[match,], lonlat=FALSE))
    }