Search code examples
rdata-manipulationpanel-data

Restructuring Panel Data in R - Years as Columns


In short:
What I have:

ID  Name  wage2010  wage2011  wage2012  kids2010  kids2011  kids2012 
1  Chris   20,000    18,000    21,000     2          2        2
2  Pat     40,000    45,000    45,000     1          1        2
3  Susan   10,000    18,000    15,000     0          1        1

What I want:

ID  Name  year  wage  kids 
1   Chris 2010  20,000  2
1   Chris 2011  18,000  2
1   Chris 2012  21,000  2
2   Pat   2010  40,000  1
2   Pat   2011  45,000  1
2   Pat   2012  45,000  2
3   Susan 2010  10,000  0
3   Susan 2011  18,000  1
3   Susan 2012  15,000  1

Any help suggestions? Thanks in avdance.


Solution

  • We can use melt from data.table

    library(data.table)
    melt(setDT(df1), measure = patterns('^wage', '^kids'),
        variable.name = 'year', value.name = c('wage', 'kids'))
    

    Or with pivot_longer from tidyr

    library(tidyr)
    pivot_longer(df1, cols = -c(ID, Name), 
       names_to = c( '.value', 'year'), names_sep = "(?<=[a-z])(?=[0-9])")
    # A tibble: 9 x 5
    #     ID Name  year  wage    kids
    #  <int> <chr> <chr> <chr>  <int>
    #1     1 Chris 2010  20,000     2
    #2     1 Chris 2011  18,000     2
    #3     1 Chris 2012  21,000     2
    #4     2 Pat   2010  40,000     1
    #5     2 Pat   2011  45,000     1
    #6     2 Pat   2012  45,000     2
    #7     3 Susan 2010  10,000     0
    #8     3 Susan 2011  18,000     1
    #9     3 Susan 2012  15,000     1
    

    data

    df1 <- structure(list(ID = 1:3, Name = c("Chris", "Pat", "Susan"),
    wage2010 = c("20,000", 
    "40,000", "10,000"), wage2011 = c("18,000", "45,000", "18,000"
    ), wage2012 = c("21,000", "45,000", "15,000"), kids2010 = 2:0, 
        kids2011 = c(2L, 1L, 1L), kids2012 = c(2L, 2L, 1L)), 
        class = "data.frame", row.names = c(NA, 
    -3L))