Search code examples
rregexdistanceunits-of-measurementgeosphere

Tidying messy coordinates for use in measurements


I have some rather messy degrees, decimal minutes coordinates (the source of which is out of my control) in the following format (see below). I am trying to work out the distance between the points ultimately.

minlat <- "51  12.93257'"
maxlat <- "66  13.20549'"
minlong <- "- 5   1.23944'"
maxlong <- "- 5   1.36293'"

As they are they are in a rather unfriendly format for (from measurements package):

measurements::conv_unit(minlat, from = 'deg_dec_min', to = 'dec_deg')

and ultimately

distm(c(minlong, minlat), c(maxlong, maxlat), fun = distHaversine)

I think I need to use the gsub( to get them into a friendly format, whereby I would like them to be

minlat <- 51 12.93257 # removing the double space
minlong <- -4 1.36293 # removing the double space and the space after the -

I've been messing around with gusb( all morning and it has beaten me, any help would be great!!


Solution

  • It sounds like you just need to strip all excess whitespace. We can try using gsub with lookarounds here.

    minlong <- " - 5   1.23944 "   # -5 1.23944
    minlong
    gsub("(?<=^|\\D) | (?=$|\\D)", "", gsub("\\s+", " ", minlong), perl=TRUE)
    
    [1] " - 5   1.23944 "
    [1] "-5 1.23944"
    

    The inner call to gsub replaces any occurence of two or more spaces with just a single space. The outer call then selectively removes a remaining single space only if it not be sandwiched between two digits.