I have two dataframes, one with my boat GPS positions (5512 records) and one with fishing boats positions (35381 records). I want to calculate the distance between my boat and all other fishing boats that were present in the area at the same time of the day (to the minute).
I created a IDdatecode (yyyymmddhhmm) for all the positions, then I merged the two dataframes based on the same IDdatecode. I did this:
merged_table<- merge(myboat,fishboats,by="IDdatecode",all.y=TRUE)
To calculate the distance I used the formula:
merged_table$distance_between_vessels=distm(c("lon1","lat1"),c("lon2","lat2"),fun=distGeo)
where lon1, lat1 are my boat positions and lon2, lat2 are fishing boats.
But I get the following error:
Error in `$<-.data.frame`(`*tmp*`, "distance_between_vessels", value = NA_real_) :
replacement has 1 row, data has 35652
In addition: Warning messages:
1: In .pointsToMatrix(x) : NAs introduced by coercion
2: In .pointsToMatrix(y) : NAs introduced by coercion
What I tried so far is:
But I still get only a list of NAs.
I used the function "distGeo" in a much simplier dataset (only my boat position) where I calculated manually the distance between first and second point, then between second and third point, and so on. The function works perfectly as it gives me exactly the right distance between two points (I checked it on ArcGIS). This is what I did:
distGeo(mydata[1, ], mydata[2, ])
distGeo(mydata[2, ], mydata[3, ])
distGeo(mydata[3, ], mydata[4, ])
So, I want to calculate 'one-to-many' distances based on a unique time of the day, but I get the above error. Any ideas on why? Thanks :)
Here, my first 10 rows of the merged table:
structure(list(Record = 1:10, IDdatecode = structure(c(1L, 2L,
3L, 3L, 4L, 4L, 5L, 5L, 6L, 6L), .Label = c("d201805081203",
"d201805081204", "d201805081205", "d201805081206", "d201805081207",
"d201805081208"), class = "factor"), lon1 = c(12.40203333, 12.4071,
12.41165, 12.41165, 12.41485, 12.41485, 12.41663333, 12.41663333,
12.41841667, 12.41841667), lat1 = c(45.1067, 45.10921667, 45.11218333,
45.11218333, 45.11303333, 45.11303333, 45.11313333, 45.11313333,
45.11348333, 45.11348333), boat1 = structure(c(1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L), .Label = "RB", class = "factor"), lon2 = c(13.02718,
13.02585827, 13.02453654, 13.02173, 13.02321482, 13.02052301,
13.02189309, 13.01931602, 13.02057136, 13.01810904), lat2 = c(44.98946,
44.99031749, 44.99117498, 44.98792, 44.99203246, 44.98868065,
44.99288995, 44.98944129, 44.99374744, 44.99020194), boat2 = structure(c(1L,
1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("IMPERO II",
"MISTRAL"), class = "factor")), .Names = c("Record", "IDdatecode",
"lon1", "lat1", "boat1", "lon2", "lat2", "boat2"), row.names = c(NA,
-10L), class = "data.frame")
V2, Update (January 17, 2022)
Glad it works for you. If you are willing to avoid for
-loops you could consider a dplyr
approach. Have a look.
library(dplyr)
df <- silvia %>%
rowwise() %>%
mutate(distance = geosphere::distGeo(c(lon1, lat1), c(lon2, lat2)))
df
The base R
**apply
-family would be another option.
V1 (January 16, 2022)
Hopefully this approach does help you. Often it is recommended to not use for-loops. However, I used one, since they are easy to understand.
I made the following assumptions:
boat1
is your boat. lat1
and lon1
represent the position of boat1
for any IDdatecode
;distGeo()
is from geosphere
package.# loading your dataframe as "silvia"
silvia <- structure(list(Record = 1:10, IDdatecode = structure(c(1L, 2L, 3L, 3L, 4L, 4L, 5L, 5L, 6L, 6L),
.Label = c("d201805081203","d201805081204", "d201805081205", "d201805081206", "d201805081207", "d201805081208"),
class = "factor"), lon1 = c(12.40203333, 12.4071, 12.41165, 12.41165, 12.41485, 12.41485, 12.41663333,
12.41663333, 12.41841667, 12.41841667), lat1 = c(45.1067, 45.10921667, 45.11218333, 45.11218333, 45.11303333,
45.11303333, 45.11313333, 45.11313333, 45.11348333, 45.11348333), boat1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L), .Label = "RB", class = "factor"), lon2 = c(13.02718, 13.02585827, 13.02453654, 13.02173, 13.02321482,
13.02052301, 13.02189309, 13.01931602, 13.02057136, 13.01810904), lat2 = c(44.98946, 44.99031749, 44.99117498, 44.98792,
44.99203246, 44.98868065, 44.99288995, 44.98944129, 44.99374744, 44.99020194), boat2 = structure(c(1L, 1L, 1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L), .Label = c("IMPERO II", "MISTRAL"), class = "factor")), .Names = c("Record", "IDdatecode",
"lon1", "lat1", "boat1", "lon2", "lat2", "boat2"), row.names = c(NA, -10L), class = "data.frame")
# for EACH ROW in "silvia" calculate the distance between c("lon1", "lat1") and c("lon2", "lat2")
for (i in 1:nrow(silvia)){
silvia$distance[i] <- geosphere::distGeo(c(silvia[i, "lon1"], silvia[i,"lat1"]),
c(silvia[i, "lon2"], silvia[i,"lat2"]))
}
# here you see the first 5 entrys of the df "silvia"
# the distances are calculated in metres
# the parameters a and f are set to WGS84 by default.
head(silvia, n=5)
#> Record IDdatecode lon1 lat1 boat1 lon2 lat2 boat2
#> 1 1 d201805081203 12.40203 45.10670 RB 13.02718 44.98946 IMPERO II
#> 2 2 d201805081204 12.40710 45.10922 RB 13.02586 44.99032 IMPERO II
#> 3 3 d201805081205 12.41165 45.11218 RB 13.02454 44.99117 IMPERO II
#> 4 4 d201805081205 12.41165 45.11218 RB 13.02173 44.98792 MISTRAL
#> 5 5 d201805081206 12.41485 45.11303 RB 13.02321 44.99203 IMPERO II
#> distance
#> 1 50943.77
#> 2 50503.93
#> 3 50118.46
#> 4 50005.52
#> 5 49774.51
Note. Created on 2022-01-16 by the reprex package (v2.0.1)