I am using latitude and longitude data to draw lines on a leaflet
map (see below). Ideally the lines would be stored in lat
and lng
columns in a data frame. In the lat
column, each starting point lat value would be followed by an ending point lat value, then by the starting point lat value for another line (the line_id
column allows each line to be distinguished). The lng
data is arranged similarly. Ideally the data frame should look like this:
> df.better
line_id lat lng
1 ABC 51.50995 -0.1345093
2 ABC 51.51074 -0.1345093
3 XYZ 51.50991 -0.1345193
4 XYZ 51.51079 -0.1351200
The problem is that it comes out of the data store in this format:
> df.wide
line_id start_lat end_lat start_lng end_lng
1 ABC 51.50995 51.51074 -0.1345093 -0.13519
2 XYZ 51.50991 51.51079 0.1351900 0.13512
This looks a bit like the classic "wide to long" data wrangling problem, for which there are many questions and answers, but a standard "long" format collapses the lat and lng data into one column, and I need two columns. I have tried a tidyverse solution as follows:
df2 <- df.wide %>% pivot_longer(cols = start_lat:end_lng,
names_to="variable",
values_to="value")
Then I clean up the variable
column:
df2$variable <- gsub(".*_lat","lat",df2$variable)
df2$variable <- gsub(".*_lng","lng",df2$variable)
This is the result, which at least seems to have the data in the right order:
> df2
A tibble: 8 x 3
line_id variable value
<fct> <chr> <dbl>
1 ABC lat 51.50995
2 ABC lat 51.51074
3 ABC lng -0.1345093
4 ABC lng -0.13519
5 XYZ lat 51.50991
6 XYZ lat 51.51079
7 XYZ lng 0.13519
8 XYZ lng 0.135120
The final step would seem to involve spreading the data again, but using pivot_wider
results in a complaint about values not being uniquely identified:
df2 %>% pivot_wider(names_from = variable,values_from = value)
# A tibble: 2 x 3
line_id lat lng
<fct> <list<dbl>> <list<dbl>>
1 ABC [2] [2]
2 XYZ [2] [2]
Warning message:
Values in `value` are not uniquely identified; output will contain list-cols.
I can (I think) see why the error occurs, but providing unique identifiers in variable
just takes me back to where I started. How can/should I approach this?
require(magrittr)
require(tidyr)
require(dplyr)
options(pillar.sigfig = 7)
df.better <- data.frame(
line_id = c("ABC","ABC","XYZ","XYZ"),
lat = c(51.509950,51.510736,51.509910,51.510786),
lng = c(-0.1345093,-0.1345093,-0.1345193,-0.135120)
)
df.wide <- data.frame(
line_id = c("ABC","XYZ"),
start_lat = c(51.509950,51.509910),
end_lat = c(51.510736,51.510786),
start_lng = c(-0.1345093,0.135190),
end_lng = c(-0.135190,0.135120)
)
df2 <- df.wide %>% pivot_longer(cols = start_lat:end_lng,
names_to="variable",
values_to="value")
df2$variable <- gsub(".*_lat","lat",df2$variable)
df2$variable <- gsub(".*_lng","lng",df2$variable)
df2 %>% pivot_wider(names_from = variable,values_from = value)
m <- leaflet() %>% setView(lng = -0.1345093, lat = 51.510090, zoom = 18) %>% addTiles()
for (i in unique(df.better$line_id)) { # HT: https://stackoverflow.com/a/44547502/952708
m <- m %>%
addPolylines(data = df.better[df.better$line_id == i, ],
lng = ~lng, lat = ~lat, color = "Green",
opacity = 0.5, weight = 2, dashArray = 5)
}
m
If I understand correctly, you're looking for something like this:
df.wide <- data.frame(
line_id = c("ABC","XYZ"),
start_lat = c(51.509950,51.509910),
end_lat = c(51.510736,51.510786),
start_lng = c(-0.1345093,0.135190),
end_lng = c(-0.135190,0.135120)
)
df.wide %>%
pivot_longer(-line_id,
names_to = c("set", ".value"),
names_pattern = "(.+)_(.+)"
)
# line_id set lat lng
# <fct> <chr> <dbl> <dbl>
#1 ABC start 51.50995 -0.1345093
#2 ABC end 51.51074 -0.13519
#3 XYZ start 51.50991 0.13519
#4 XYZ end 51.51079 0.135120