Search code examples
rstata

ID matching in panel data


My data is organized in the following way:

ID  Year  ID2012   var1   var2
1   2012     .
A   2014     1
A   2016     .
2   2012     .
B   2014     2
B   2016     .

For year 2014, I know the correspondence between ID2012 and ID2014, and I know that ID2016 is coded in the same way as ID2014. Now I want to make ID across the years consistent to get a panel: either A is replaced by 1 or the other way round.

ID  Year  ID2012   var1   var2
A   2012     .
A   2014     1
A   2016     .
B   2012     .
B   2014     2
B   2016     .

or

ID  Year  ID2012   var1   var2
1   2012     .
1   2014     1
1   2016     .
2   2012     .
2   2014     2
2   2016     .

There is no arithmetic rule between ID2012 and ID2014 that can be utilized.


Solution

  • Here's one way to do it:

    # Read in the data
    df <- read.table(text = "ID  Year  ID2012 var1
                             1   2012     .    9.1
                             A   2014     1    2.2
                             A   2016     .    1.0
                             2   2012     .    4.8
                             B   2014     2    4.7
                             B   2016     .    6.1", 
                     header = T, stringsAsFactors = F)
    
    # Find the positions where ID2012 tells us what ID to replace it with
    ID.map <- match(df$ID, df$ID2012) # [1]  2 NA NA  5 NA NA
    
    # Replace values in the ID column where they are defined in the map
    df$ID[!is.na(ID.map)] <- df[ID.map[!is.na(ID.map)], 'ID']
    
    # Result:
    #   ID Year ID2012 var1
    # 1  A 2012      .  9.1
    # 2  A 2014      1  2.2
    # 3  A 2016      .  1.0
    # 4  B 2012      .  4.8
    # 5  B 2014      2  4.7
    # 6  B 2016      .  6.1