Search code examples
rdata-cleaning

rearrange data in a specific structure


I have data like this format:

state year1 year 2
First 2000 2004-2005
Second 2007 2010-2011
Third 2008 2010
Third 2010 2012

I want to make this:

state year
First 2000
First 2004-2005
Second 2007
Second 2010-2011
Third 2008
Third 2010
Third 2012

The code can be in R or Python. Thanks in advance


Solution

  • There is a function in the data.table package, called melt( ) which allows you to convert data from wide to long format. In this case I am keeping State as my ID variable and the variables I would like pulled into my value field are Year1 and Year2. There is a line that keeps unique observations to remove duplicates.

    library(data.table)
    
    data <- data.table(
      State = c("First","Second","Third","Third"),
      Year1 = c("2000","2007","2008","2010"),
      Year2 = c("2004-2005","2010-2011","2010","2012"))
    
    data
        State Year1     Year2
    1:  First  2000 2004-2005
    2: Second  2007 2010-2011
    3:  Third  2008      2010
    4:  Third  2010      2012
    
    data2 <- melt(
      data = data,
      id.vars = c("State"),
      measure.vars = c("Year1","Year2"),
      variable.name = "Year",
      value.name = "years")
    
    data2 <- unique(data2)
    
    data2[order(State),.(State,years)]
        State     years
    1:  First      2000
    2:  First 2004-2005
    3: Second      2007
    4: Second 2010-2011
    5:  Third      2008
    6:  Third      2010
    7:  Third      2010
    8:  Third      2012