I have a CSV of places with geographic coordinates in degrees minutes seconds format but with no separators like this:
df <- data.frame(name = c("farm_1", "farm_2", "seabrook_1", "rocks_road"),
lat = c(425319.3, 425317, 425317.1, 425323.3),
long = c(705045.5, 705101.1, 705145.4, 705219.8))
name long lat
farm_1 425319.3 705045.5
farm_2 425317 705101.1
seabrook_1 425317.1 705145.4
rocks_road 425323.3 705219.8
I have another CSV of places with geographic coordinates in degrees minutes minutes format like this:
df_2 <- data.frame(name = c("exeter_road", "hampton_hill", "portsmouth_ave", "pebble_ln"),
GPS_cordinates_DMM = c("N42 58.855 W70 56.473", "N42 58.666 W70 54.981",
"N42 56.579 W70 52.550", "N42 55.949 W70 53.631"))
name GPS_cordinates_DMM
exeter_road N42 58.855 W70 56.473
hampton_hill N42 58.666 W70 54.981
portsmouth_ave N42 56.579 W70 52.550
pebble_ln N42 55.949 W70 53.631
I would like to parse the coordinates in each data frame and convert them to decimal latitude and longitude. For example, the first data frame would look like this:
df_dec <- data.frame(name = c("farm_1", "farm_2", "seabrook_1", "rocks_road"),
latitude = c(42.88869444, 42.88805556, 42.88808333, 42.88980556),
longitude = c(70.84597222, 70.85030556, 70.86261111, 70.87216667))
name latitude longitude
farm_1 42.88869 70.84597
farm_2 42.88806 70.85031
seabrook_1 42.88808 70.86261
rocks_road 42.88981 70.87217
And the second data frame would look like this:
df_2_dec <- df_2 <- data.frame(name = c("exeter_road", "hampton_hill", "portsmouth_ave", "pebble_ln"),
latitude = c(42.98091667, 42.97776667, 42.94298333, 42.93248333),
longitude = c(70.94121667, 70.91635, 70.87583333, 70.89385))
name latitude longitude
exeter_road 42.98092 70.94122
hampton_hill 42.97777 70.91635
portsmouth_ave. 42.94298 70.87583
pebble_ln 42.93248 70.89385
Then I can eventually combine and map/analyze them.
Is there a package or fucntion that can parse and convert these coordinate types?
If not, how would you recommend writing one that is robust and can deal with issues such as no decimal in the latitude of the second row of the first dataset?
Using substr
you may scrape the numeric values for degrees, minutes, and seconds out of the strings according to its position (substring
soesn't need an ending position), turn them to numerics and calculate.
f1 <- function(x) (as.numeric(substr(x, 1, 2))*60^2 + as.numeric(substr(x, 3, 4))*60 +
as.numeric(substring(x, 5)))/60^2
res1 <- data.frame(name=df$name, lapply(df[-1], f1))
res1
# name lat long
# 1 farm_1 42.88869 70.84597
# 2 farm_2 42.88806 70.85031
# 3 seabrook_1 42.88808 70.86261
# 4 rocks_road 42.88981 70.87217
The second specimen we may split at N, S, E, or W. using strsplit
and basically do the same as with the first one.
tmp <- as.data.frame(
gsub("\\D", "", do.call(rbind, strsplit(df_2$GPS_cordinates_DMM, "[NSEW]"))[,-1]))
f2 <- function(x) as.numeric(substr(x, 1, 2)) +
as.numeric(substring(x, 3))/1e3/60
res2 <- data.frame(name=df_2$name, setNames(lapply(tmp, f2), c("lat", "lon")))
res2
# name lat lon
# 1 exeter_road 42.98092 70.94122
# 2 hampton_hill 42.97777 70.91635
# 3 portsmouth_ave 42.94298 70.87583
# 4 pebble_ln 42.93248 70.89385