Search code examples
rdataframepanel

Invert columns/Change unit of observation


I am working with data that looks like this:

 CATEGORY PARENT PROPERTY `2018 DOLS` `2018 UNITS` `2017 DOLS` `2017 UNITS`
      xxx     A       P1       100            1          200          2
      xxx     A       P2       NA             NA         200          1
      xxx     B       P3       300            1          NA           NA

As you can see, observations in the PARENT column may be repeated several times. I would like to covert this data frame into a panel at the PARENT-YEAR level in the following way:

CATEGORY PARENT   YEAR    `P1 DOLS` `P1 UNITS` `P2 DOLS` `P2 UNITS` `P3 DOLS` `P3 UNITS`
   xxx     A       2017       200      2          200          1      0           0
   xxx     A       2018       100      1           0           0      0           0
   xxx     B       2018         0      0           0           0     300          1

Notice that essentially this amount to invert columns and road between years and PROPERTY entries (while ignoring NAs). I would like to know what would be the most efficient way to carry out this task? Thanks!


Solution

  • Using tidyr's pivot_longer and pivot_wider.

    library(tidyr)
    
    df %>%
      pivot_longer(cols = matches('^\\d+'), 
                   names_to = c('Year', 'name'), 
                   names_sep = '\\s+', 
                   values_drop_na = TRUE) %>%
      pivot_wider(names_from = c(PROPERTY, name), values_from = value, 
                  values_fill = 0, names_sep = ' ')
    
    # CATEGORY PARENT Year  `P1 DOLS` `P1 UNITS` `P2 DOLS` `P2 UNITS` `P3 DOLS` `P3 UNITS`
    #  <chr>    <chr>  <chr>     <int>      <int>     <int>      <int>     <int>      <int>
    #1 xxx      A      2018        100          1         0          0         0          0
    #2 xxx      A      2017        200          2       200          1         0          0
    #3 xxx      B      2018          0          0         0          0       300          1
    

    data

    df <- structure(list(CATEGORY = c("xxx", "xxx", "xxx"), PARENT = c("A", 
    "A", "B"), PROPERTY = c("P1", "P2", "P3"), `2018 DOLS` = c(100L, 
    NA, 300L), `2018 UNITS` = c(1L, NA, 1L), `2017 DOLS` = c(200L, 
    200L, NA), `2017 UNITS` = c(2L, 1L, NA)), 
    class = "data.frame", row.names = c(NA, -3L))