I have data on the following format:
# Reproducible example
order <- c(4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 4, 5, 6, 7 ,8 ,9 )
values <- c(100, 74 , 70 , 88, 104 ,177 ,88, 189 , 75 , 58, 105, 171 , 29, 60 , 71 , 37 , 93, 99, 206 , 74 , 82 , 69 , 67, 102, 161 , 60 , 92 , 62 ,104, 34, 108, 53 , 50 ,80 , 70 , 77 , 76, 105 ,115 , 78)
journey_id <- c(1, 1, 1 ,1 ,1, 1, 1, 1, 1 ,1, 1, 1, 1, 1, 1 ,1 ,1 ,2, 2, 2 ,2 ,2 ,2, 2, 2, 2, 2, 2, 2, 2 ,2, 2, 2 ,2, 3 ,3 ,3 ,3 ,3 ,3)
df <- data.frame(order, values, journey_id)
Where order
refers to a stop along a route, and values
are observed values of that stop. I would transform this to journey based data, where each row should be an observation of a single journey, where the columns are the order
and the values are taken from values
. All journeys do not necessarily have an observed value for all the stops.
The output should look like this:
## OUTPUT##
# 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, ..., 20
#journey1 100, 74, 70, 88, 104, 177, 88, 189, 75, 58, ..., 93
#journey2 99, 206, 74, 82, 69, 67, 102, 161, 60, 92, ..., 80
#journey3 70, 77, 76, 105, 115, 78, NA, NA, NA, NA, ..., NA
My data is quite large, so if possible I'd prefer to not loop over the rows in the data frame, but rather use a vectorized solution.
The value associated with a stop is not in all cases unique for a single journey.
With pivot_wider
:
library(tidyr)
library(dplyr)
df %>%
pivot_wider(names_from = "order", values_from = "values")
journey_id 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1 1 100 74 70 88 104 177 88 189 75 58 105 171 29 60 71 37 93
2 2 99 206 74 82 69 67 102 161 60 92 62 104 34 108 53 50 80
3 3 70 77 76 105 115 78 NA NA NA NA NA NA NA NA NA NA NA