Search code examples
rindexingindicator

Is there any solution in R to assign TRUE/FALSE to first occurrence (observed the current period and not the one before)


I define first occurrence as observing one variable in the current period, but not in the period before (1 period is equivalent to 1 year).

Therefore, in the following example, I want to check if the company product, V3 (third column), is observed for first time (following the definition I gave before) taking into account the time-variable, V2 (second column).

a <- as.data.frame(matrix(c(1,1,1,1,1,2005,2006,2007,2009,2010, "A", "B", "A", "A", "A"), ncol = 3))

I want to create a new column indicator (solution would be V4) that accounts for the first occurrence (observation in the current period , that it was not observed on the period before, )

b <- as.data.frame(matrix(c(1,1,1,1,1,2005,2006,2007,2009,2010, "A", "B", "A", "A", "A","TRUE", "TRUE", "TRUE", "TRUE", "FALSE"), ncol = 4))

I have tried with min() function as well as with crazy loops. But I did not come up with the appropriate solution.

Note: V1 represents the company id. In my database I have thousands of different companies

Any clue?

Regards


Solution

  • An option using data.table:

    library(data.table)
    DT <- fread("Company Year Product
    1 2005  A
    1 2006  B
    1 2007  A
    1 2009  A
    1 2010  A")
    
    DT[, yearBef := Year - 1L]
    DT[, NotInLastYear := DT[DT, on=.(Company, Product, Year=yearBef), 
        fcoalesce(x.Year==i.Year, TRUE)]]