Search code examples
rmergedata.tabledcast

Conditional merge, based an event happening between two panel observations


I have a panel dataset: panel and a dataset with a list of events: Events. For the panel dataset, an equal panelID shows that two observations belong together.

panelID = c(1:50)   
year= c(2001:2010)
country = c("NLD", "GRC", "GBR")

n <- 2

library(data.table)
set.seed(123)
Panel <- data.table(panelID = rep(sample(panelID), each = n),
                 country = rep(sample(country, length(panelID), replace = T), each = n),
                 year = c(replicate(length(panelID), sample(year, n))),
                 some_NA = sample(0:5, 6),                                             
                 some_NA_factor = sample(0:5, 6),         
                 norm = round(runif(100)/10,2),
                 Income = round(rnorm(10,-5,5),2),
                 Happiness = sample(10,10),
                 Sex = round(rnorm(10,0.75,0.3),2),
                 Age = sample(100,100),
                 Educ = round(rnorm(10,0.75,0.3),2))        
Panel[, uniqueID := .I]                                                                        # Creates a unique ID     
Panel[Panel == 0] <- NA    


Events <- fread(
"Event_Type  country year   
A   NLD   2005
C   NLD   2004       
A   GBR   2006
B   GBR   2003   
A   GRC   2002             
D   GRC   2007",
header = TRUE)

================================================================================ EDIT:

Events <- fread(
"Event_Type  country year   
A   NLD   2005
A   NLD   2004       
A   GBR   2006
A   GBR   2003   
A   GRC   2002             
A   GRC   2007",
header = TRUE)

EDITED DESIRED OUTCOME:

panleID country year 2002  2003  2004 2005 2006 2007 
1       NLD     2002 NA    NA    1    1    NA   NA 
1       NLD     2006 NA    NA    1    1    NA   NA 

=======================================================================

I would like the value in the column Event_Type to be added to Panel, if the year of the event is between two panel observation (and in the same country).

As an example, let's take the following panel observation:

panleID country year
1       NLD     2002
1       NLD     2006

Panel will get 4 extra columns A to D. The column A, will get a 1 in the column if the event in country NLD in the year 2005 (first line Events, takes place in one of or between the two years. Since it does this results in the following:

panleID country year A  B  C  D 
1       NLD     2002 1  NA NA NA
1       NLD     2006 1  NA NA NA

I know that merging with the same year goes as follows:

merge(Panel, dcast(Events, iso + country ~ Event_Type),
      by = c("country", "year"))

But how should I make the merge if I want the values to be equal to or in between the two panelID years?


Solution

  • Here is a go at your problem using data.table The code can be shortened, but I always find it useful (expecially on SO) to show all steps in between for easy error-checking and validation.

    #first, summarise Panel, to get the time-span of the panelID
    Panel.short <- Panel[, .(country = unique(country), 
                             start = min(year), 
                             end = max(year) ), 
                         by = .(panelID)]
    #    panelID country start  end
    # 1:       1     NLD  2002 2006
    
    #perform left non-equi join
    Panel.short.joined <- Events[ Panel.short, on =.(country, year >= start, year <= end), mult = "all"][]
    #    Event_Type country year year.1 panelID
    # 1:          A     NLD 2002   2006       1
    # 2:          C     NLD 2002   2006       1
    
    #cast to wide
    Panel.final <- dcast( Panel.short.joined, 
           panelID + country ~ Event_Type, 
           fun.aggregate = length )
    #    panelID country A C
    # 1:       1     NLD 1 1
    
    #perform update join on the original Panel
    Panel[, `:=`(A=0, B=0, C=0, D=0)][ 
      Panel.final, 
      `:=`( A = i.A, C = i.C),   # <- add B = i.B and D = i.D here 
      on = .( panelID )][]
    #    panelID country year A B C D
    # 1:       1     NLD 2002 1 0 1 0
    # 2:       1     NLD 2006 1 0 1 0