I am working with data that looks like this:
CATEGORY PARENT PROPERTY `2018 DOLS` `2018 UNITS` `2017 DOLS` `2017 UNITS`
xxx A P1 100 1 200 2
xxx A P2 NA NA 200 1
xxx B P3 300 1 NA NA
As you can see, observations in the PARENT column may be repeated several times. I would like to covert this data frame into a panel at the PARENT-YEAR level in the following way:
CATEGORY PARENT YEAR `P1 DOLS` `P1 UNITS` `P2 DOLS` `P2 UNITS` `P3 DOLS` `P3 UNITS`
xxx A 2017 200 2 200 1 0 0
xxx A 2018 100 1 0 0 0 0
xxx B 2018 0 0 0 0 300 1
Notice that essentially this amount to invert columns and road between years and PROPERTY entries (while ignoring NAs). I would like to know what would be the most efficient way to carry out this task? Thanks!
Using tidyr
's pivot_longer
and pivot_wider
.
library(tidyr)
df %>%
pivot_longer(cols = matches('^\\d+'),
names_to = c('Year', 'name'),
names_sep = '\\s+',
values_drop_na = TRUE) %>%
pivot_wider(names_from = c(PROPERTY, name), values_from = value,
values_fill = 0, names_sep = ' ')
# CATEGORY PARENT Year `P1 DOLS` `P1 UNITS` `P2 DOLS` `P2 UNITS` `P3 DOLS` `P3 UNITS`
# <chr> <chr> <chr> <int> <int> <int> <int> <int> <int>
#1 xxx A 2018 100 1 0 0 0 0
#2 xxx A 2017 200 2 200 1 0 0
#3 xxx B 2018 0 0 0 0 300 1
data
df <- structure(list(CATEGORY = c("xxx", "xxx", "xxx"), PARENT = c("A",
"A", "B"), PROPERTY = c("P1", "P2", "P3"), `2018 DOLS` = c(100L,
NA, 300L), `2018 UNITS` = c(1L, NA, 1L), `2017 DOLS` = c(200L,
200L, NA), `2017 UNITS` = c(2L, 1L, NA)),
class = "data.frame", row.names = c(NA, -3L))