Search code examples
rtimedummy-variable

How to find observations whose dummy variable changes from 1 to 0 (and not viceversa) in a df in r


I have a survey composed of n individuals; each individual is present more than one time in the survey (panel). I have a variable pens, which is a dummy that takes value 1 if the individual invests in a complementary pension form. For example:

df <- data.frame(year=c(2002,2002,2004,2004,2006,2008), id=c(1,2,1,2,3,3), y.b=c(1950,1943,1950,1943,1966,1966), sex=c("F", "M", "F", "M", "M", "M"), income=c(100000,55000,88000,66000,12000,24000), pens=c(0,1,1,0,1,1))

year  id  y.b   sex   income   pens   
2002  1   1950   F    100000     0     
2002  2   1943   M    55000      1    
2004  1   1950   F    88000      1    
2004  2   1943   M    66000      0    
2006  3   1966   M    12000      1    
2008  3   1966   M    24000      1    

where id is the individual, y.b is year of birth, pens is the dummy variable regarding complementary pension.

I want to know if there are individuals that invested in a complementary pension form in year t but didn't hold the complementary pension form in year t+2 (the survey is conducted every two years). In this way I want to know how many person had a complementary pension form but released it before pension or gave up (for example for economic reasons).

I tried with this command:

df$x <- (ave(df$pens, df$id, FUN = function(x)length(unique(x)))==1)*1
which(df$x=="0")

and actually I have the individuals whose pens variable had changed during time (the command check if a variable is constant in time). For this reason I find individuals whose pens variable changed from 0 (didn't have complementary pension) in year t to 1 in year t+2 and viceversa; but I am interested in individuals whose pens variable was 1 (had a complementary pensione) in year t and 0 in year t+2.

If I use this command with the df I get that for id 1 and 2 the variable x is 0 (pens variable isn't constant), but I'd need to find a way to get just id 2 (whose pens variable changed from 1 to 0).

df$x <- (ave(df$pens, df$id, FUN = function(x)length(unique(x)))==1)*1
which(df$x=="0")

  year id pens x
1 2002  1    0 0
2 2002  2    1 0
3 2004  1    1 0
4 2004  2    0 0
5 2006  3    1 1
6 2008  3    1 1

(for the sake of semplicity I omitted other variables)

So the desired output is:

  year id pens x
1 2002  1    0 1
2 2002  2    1 0
3 2004  1    1 1
4 2004  2    0 0
5 2006  3    1 1
6 2008  3    1 1

only id 2 has x=0 since the pens variable changed from 1 to 0.

Thanks in advance


Solution

  • This assigns 1 to the id's for which there is a decline in pens and 0 otherwise.

    transform(d.d, x = ave(pens, id, FUN = function(x) any(diff(x) < 0)))
    

    giving:

      year id  y.b sex income pens   x
    1 2002  1 1950   F 100000    0   0
    2 2002  2 1943   M  55000    1   1
    3 2004  1 1950   F  88000    1   0
    4 2004  2 1943   M  66000    0   1
    5 2006  3 1966   M  12000    1   0
    6 2008  3 1966   M  24000    1   0
    

    This should work even even if there are more than 2 rows per id but if we knew there were always 2 rows then we could omit the any simplifying it to:

    transform(d.d, x = ave(pens, id, FUN = diff) < 0)
    

    Note: The input in reproducible form is:

    Lines <- "year  id  y.b   sex   income   pens   
    2002  1   1950   F    100000     0     
    2002  2   1943   M    55000      1    
    2004  1   1950   F    88000      1    
    2004  2   1943   M    66000      0    
    2006  3   1966   M    12000      1    
    2008  3   1966   M    24000      1"
    
    d.d <- read.table(text = Lines, header = TRUE, check.names = FALSE)