Search code examples
rdistancelatitude-longitudecoerciongeographic-distance

Calculating distance between multiple points at the same time of the day


I have two dataframes, one with my boat GPS positions (5512 records) and one with fishing boats positions (35381 records). I want to calculate the distance between my boat and all other fishing boats that were present in the area at the same time of the day (to the minute).

I created a IDdatecode (yyyymmddhhmm) for all the positions, then I merged the two dataframes based on the same IDdatecode. I did this:

merged_table<- merge(myboat,fishboats,by="IDdatecode",all.y=TRUE)

To calculate the distance I used the formula:

merged_table$distance_between_vessels=distm(c("lon1","lat1"),c("lon2","lat2"),fun=distGeo)

where lon1, lat1 are my boat positions and lon2, lat2 are fishing boats.

But I get the following error:

Error in `$<-.data.frame`(`*tmp*`, "distance_between_vessels", value = NA_real_) : 
  replacement has 1 row, data has 35652
In addition: Warning messages:
1: In .pointsToMatrix(x) : NAs introduced by coercion
2: In .pointsToMatrix(y) : NAs introduced by coercion

What I tried so far is:

  1. use this other formula: merged_table$distance_between_vessels=distGeo(c("lon1","lat1"),c("lon2","lat2"))
  2. put all the columns of lat and lon "as.numeric"
  3. use only interval times where both my boat and fishing boats were present
  4. ignore the warning and keep going

But I still get only a list of NAs.

I used the function "distGeo" in a much simplier dataset (only my boat position) where I calculated manually the distance between first and second point, then between second and third point, and so on. The function works perfectly as it gives me exactly the right distance between two points (I checked it on ArcGIS). This is what I did:

distGeo(mydata[1, ], mydata[2, ])
distGeo(mydata[2, ], mydata[3, ])
distGeo(mydata[3, ], mydata[4, ])

So, I want to calculate 'one-to-many' distances based on a unique time of the day, but I get the above error. Any ideas on why? Thanks :)

Here, my first 10 rows of the merged table:

structure(list(Record = 1:10, IDdatecode = structure(c(1L, 2L, 
3L, 3L, 4L, 4L, 5L, 5L, 6L, 6L), .Label = c("d201805081203", 
"d201805081204", "d201805081205", "d201805081206", "d201805081207", 
"d201805081208"), class = "factor"), lon1 = c(12.40203333, 12.4071, 
12.41165, 12.41165, 12.41485, 12.41485, 12.41663333, 12.41663333, 
12.41841667, 12.41841667), lat1 = c(45.1067, 45.10921667, 45.11218333, 
45.11218333, 45.11303333, 45.11303333, 45.11313333, 45.11313333, 
45.11348333, 45.11348333), boat1 = structure(c(1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L), .Label = "RB", class = "factor"), lon2 = c(13.02718, 
13.02585827, 13.02453654, 13.02173, 13.02321482, 13.02052301, 
13.02189309, 13.01931602, 13.02057136, 13.01810904), lat2 = c(44.98946, 
44.99031749, 44.99117498, 44.98792, 44.99203246, 44.98868065, 
44.99288995, 44.98944129, 44.99374744, 44.99020194), boat2 = structure(c(1L, 
1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("IMPERO II", 
"MISTRAL"), class = "factor")), .Names = c("Record", "IDdatecode", 
"lon1", "lat1", "boat1", "lon2", "lat2", "boat2"), row.names = c(NA, 
-10L), class = "data.frame")

Solution

  • V2, Update (January 17, 2022)

    Glad it works for you. If you are willing to avoid for-loops you could consider a dplyr approach. Have a look.

      library(dplyr)
      
      df <- silvia %>%
        rowwise() %>% 
        mutate(distance = geosphere::distGeo(c(lon1, lat1), c(lon2, lat2)))
      df
    

    The base R **apply-family would be another option.


    V1 (January 16, 2022)

    Hopefully this approach does help you. Often it is recommended to not use for-loops. However, I used one, since they are easy to understand.

    I made the following assumptions:

    • boat1 is your boat. lat1 and lon1 represent the position of boat1 for any IDdatecode;
    • as I did not fully understand what you mean with "based on a unique time of the day" I assumed looping over each row is sufficient;
    • the function distGeo() is from geosphere package.
    # loading your dataframe as "silvia"
    silvia <- structure(list(Record = 1:10, IDdatecode = structure(c(1L, 2L, 3L, 3L, 4L, 4L, 5L, 5L, 6L, 6L),
              .Label = c("d201805081203","d201805081204", "d201805081205", "d201805081206", "d201805081207", "d201805081208"),
              class = "factor"), lon1 = c(12.40203333, 12.4071, 12.41165, 12.41165, 12.41485, 12.41485, 12.41663333, 
              12.41663333, 12.41841667, 12.41841667), lat1 = c(45.1067, 45.10921667, 45.11218333, 45.11218333, 45.11303333, 
              45.11303333, 45.11313333, 45.11313333, 45.11348333, 45.11348333), boat1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
              1L, 1L, 1L), .Label = "RB", class = "factor"), lon2 = c(13.02718, 13.02585827, 13.02453654, 13.02173, 13.02321482,
              13.02052301, 13.02189309, 13.01931602, 13.02057136, 13.01810904), lat2 = c(44.98946, 44.99031749, 44.99117498, 44.98792,
              44.99203246, 44.98868065, 44.99288995, 44.98944129, 44.99374744, 44.99020194), boat2 = structure(c(1L, 1L, 1L, 2L,
              1L, 2L, 1L, 2L, 1L, 2L), .Label = c("IMPERO II", "MISTRAL"), class = "factor")), .Names = c("Record", "IDdatecode", 
              "lon1", "lat1", "boat1", "lon2", "lat2", "boat2"), row.names = c(NA, -10L), class = "data.frame")
    
    
    # for EACH ROW in "silvia" calculate the distance between c("lon1", "lat1") and c("lon2", "lat2")
    for (i in 1:nrow(silvia)){
    
      silvia$distance[i] <- geosphere::distGeo(c(silvia[i, "lon1"], silvia[i,"lat1"]), 
                                    c(silvia[i, "lon2"], silvia[i,"lat2"])) 
    
    }
    
    
    # here you see the first 5 entrys of the df "silvia"
    # the distances are calculated in metres 
    # the parameters a and f are set to WGS84 by default.
    head(silvia, n=5)
    #>   Record    IDdatecode     lon1     lat1 boat1     lon2     lat2     boat2
    #> 1      1 d201805081203 12.40203 45.10670    RB 13.02718 44.98946 IMPERO II
    #> 2      2 d201805081204 12.40710 45.10922    RB 13.02586 44.99032 IMPERO II
    #> 3      3 d201805081205 12.41165 45.11218    RB 13.02454 44.99117 IMPERO II
    #> 4      4 d201805081205 12.41165 45.11218    RB 13.02173 44.98792   MISTRAL
    #> 5      5 d201805081206 12.41485 45.11303    RB 13.02321 44.99203 IMPERO II
    #>   distance
    #> 1 50943.77
    #> 2 50503.93
    #> 3 50118.46
    #> 4 50005.52
    #> 5 49774.51
    

    Note. Created on 2022-01-16 by the reprex package (v2.0.1)