I define first occurrence as observing one variable in the current period, but not in the period before (1 period is equivalent to 1 year).
Therefore, in the following example, I want to check if the company product, V3 (third column), is observed for first time (following the definition I gave before) taking into account the time-variable, V2 (second column).
a <- as.data.frame(matrix(c(1,1,1,1,1,2005,2006,2007,2009,2010, "A", "B", "A", "A", "A"), ncol = 3))
I want to create a new column indicator (solution would be V4) that accounts for the first occurrence (observation in the current period , that it was not observed on the period before, )
b <- as.data.frame(matrix(c(1,1,1,1,1,2005,2006,2007,2009,2010, "A", "B", "A", "A", "A","TRUE", "TRUE", "TRUE", "TRUE", "FALSE"), ncol = 4))
I have tried with min() function as well as with crazy loops. But I did not come up with the appropriate solution.
Note: V1 represents the company id. In my database I have thousands of different companies
Any clue?
Regards
An option using data.table
:
library(data.table)
DT <- fread("Company Year Product
1 2005 A
1 2006 B
1 2007 A
1 2009 A
1 2010 A")
DT[, yearBef := Year - 1L]
DT[, NotInLastYear := DT[DT, on=.(Company, Product, Year=yearBef),
fcoalesce(x.Year==i.Year, TRUE)]]