Search code examples
rtidyrr-leaflet

Collapsing four data frame columns into two, interleaved columns


I am using latitude and longitude data to draw lines on a leaflet map (see below). Ideally the lines would be stored in lat and lng columns in a data frame. In the lat column, each starting point lat value would be followed by an ending point lat value, then by the starting point lat value for another line (the line_id column allows each line to be distinguished). The lng data is arranged similarly. Ideally the data frame should look like this:

> df.better
  line_id      lat        lng
1     ABC 51.50995 -0.1345093
2     ABC 51.51074 -0.1345093
3     XYZ 51.50991 -0.1345193
4     XYZ 51.51079 -0.1351200

The problem is that it comes out of the data store in this format:

> df.wide
  line_id start_lat  end_lat  start_lng  end_lng
1     ABC  51.50995 51.51074 -0.1345093 -0.13519
2     XYZ  51.50991 51.51079  0.1351900  0.13512

This looks a bit like the classic "wide to long" data wrangling problem, for which there are many questions and answers, but a standard "long" format collapses the lat and lng data into one column, and I need two columns. I have tried a tidyverse solution as follows:

df2 <- df.wide %>% pivot_longer(cols = start_lat:end_lng,
                         names_to="variable",
                         values_to="value")

Then I clean up the variable column:

df2$variable <- gsub(".*_lat","lat",df2$variable)
df2$variable <- gsub(".*_lng","lng",df2$variable)

This is the result, which at least seems to have the data in the right order:

> df2
A tibble: 8 x 3
  line_id variable      value
  <fct>   <chr>         <dbl>
1 ABC     lat      51.50995  
2 ABC     lat      51.51074  
3 ABC     lng      -0.1345093
4 ABC     lng      -0.13519  
5 XYZ     lat      51.50991  
6 XYZ     lat      51.51079  
7 XYZ     lng       0.13519  
8 XYZ     lng       0.135120 

The final step would seem to involve spreading the data again, but using pivot_wider results in a complaint about values not being uniquely identified:

df2 %>% pivot_wider(names_from = variable,values_from = value)
    # A tibble: 2 x 3
      line_id         lat         lng
      <fct>   <list<dbl>> <list<dbl>>
    1 ABC             [2]         [2]
    2 XYZ             [2]         [2]
    Warning message:
Values in `value` are not uniquely identified; output will contain list-cols.

I can (I think) see why the error occurs, but providing unique identifiers in variable just takes me back to where I started. How can/should I approach this?

require(magrittr)
require(tidyr)
require(dplyr)

options(pillar.sigfig = 7)

df.better <- data.frame(
  line_id = c("ABC","ABC","XYZ","XYZ"),
  lat = c(51.509950,51.510736,51.509910,51.510786),
  lng = c(-0.1345093,-0.1345093,-0.1345193,-0.135120)
)

df.wide <- data.frame(
  line_id = c("ABC","XYZ"),
  start_lat = c(51.509950,51.509910),
  end_lat = c(51.510736,51.510786),
  start_lng = c(-0.1345093,0.135190),
  end_lng = c(-0.135190,0.135120)
)

df2 <- df.wide %>% pivot_longer(cols = start_lat:end_lng,
                         names_to="variable",
                         values_to="value")

df2$variable <- gsub(".*_lat","lat",df2$variable)
df2$variable <- gsub(".*_lng","lng",df2$variable)

df2 %>% pivot_wider(names_from = variable,values_from = value)

m <- leaflet() %>% setView(lng = -0.1345093, lat = 51.510090, zoom = 18) %>% addTiles()

for (i in unique(df.better$line_id)) { # HT: https://stackoverflow.com/a/44547502/952708
  m <- m %>%
    addPolylines(data = df.better[df.better$line_id == i, ],
                 lng = ~lng, lat = ~lat, color = "Green",
                 opacity = 0.5, weight = 2, dashArray = 5)
}

m

Solution

  • If I understand correctly, you're looking for something like this:

    df.wide <- data.frame(
      line_id = c("ABC","XYZ"),
      start_lat = c(51.509950,51.509910),
      end_lat = c(51.510736,51.510786),
      start_lng = c(-0.1345093,0.135190),
      end_lng = c(-0.135190,0.135120)
    )
    
    df.wide %>% 
      pivot_longer(-line_id,
                   names_to = c("set", ".value"),
                   names_pattern = "(.+)_(.+)"
      )
    
    #  line_id set        lat        lng
    #  <fct>   <chr>    <dbl>      <dbl>
    #1 ABC     start 51.50995 -0.1345093
    #2 ABC     end   51.51074 -0.13519  
    #3 XYZ     start 51.50991  0.13519  
    #4 XYZ     end   51.51079  0.135120