I have a survey composed of n individuals; each individual is present more than one time in the survey (panel). I have a variable pens, which is a dummy that takes value 1 if the individual invests in a complementary pension form. For example:
df <- data.frame(year=c(2002,2002,2004,2004,2006,2008), id=c(1,2,1,2,3,3), y.b=c(1950,1943,1950,1943,1966,1966), sex=c("F", "M", "F", "M", "M", "M"), income=c(100000,55000,88000,66000,12000,24000), pens=c(0,1,1,0,1,1))
year id y.b sex income pens
2002 1 1950 F 100000 0
2002 2 1943 M 55000 1
2004 1 1950 F 88000 1
2004 2 1943 M 66000 0
2006 3 1966 M 12000 1
2008 3 1966 M 24000 1
where id is the individual, y.b is year of birth, pens is the dummy variable regarding complementary pension.
I want to know if there are individuals that invested in a complementary pension form in year t but didn't hold the complementary pension form in year t+2 (the survey is conducted every two years). In this way I want to know how many person had a complementary pension form but released it before pension or gave up (for example for economic reasons).
I tried with this command:
df$x <- (ave(df$pens, df$id, FUN = function(x)length(unique(x)))==1)*1
which(df$x=="0")
and actually I have the individuals whose pens variable had changed during time (the command check if a variable is constant in time). For this reason I find individuals whose pens variable changed from 0 (didn't have complementary pension) in year t to 1 in year t+2 and viceversa; but I am interested in individuals whose pens variable was 1 (had a complementary pensione) in year t and 0 in year t+2.
If I use this command with the df I get that for id 1 and 2 the variable x is 0 (pens variable isn't constant), but I'd need to find a way to get just id 2 (whose pens variable changed from 1 to 0).
df$x <- (ave(df$pens, df$id, FUN = function(x)length(unique(x)))==1)*1
which(df$x=="0")
year id pens x
1 2002 1 0 0
2 2002 2 1 0
3 2004 1 1 0
4 2004 2 0 0
5 2006 3 1 1
6 2008 3 1 1
(for the sake of semplicity I omitted other variables)
So the desired output is:
year id pens x
1 2002 1 0 1
2 2002 2 1 0
3 2004 1 1 1
4 2004 2 0 0
5 2006 3 1 1
6 2008 3 1 1
only id 2 has x=0 since the pens variable changed from 1 to 0.
Thanks in advance
This assigns 1 to the id's for which there is a decline in pens
and 0 otherwise.
transform(d.d, x = ave(pens, id, FUN = function(x) any(diff(x) < 0)))
giving:
year id y.b sex income pens x
1 2002 1 1950 F 100000 0 0
2 2002 2 1943 M 55000 1 1
3 2004 1 1950 F 88000 1 0
4 2004 2 1943 M 66000 0 1
5 2006 3 1966 M 12000 1 0
6 2008 3 1966 M 24000 1 0
This should work even even if there are more than 2 rows per id but if we knew there were always 2 rows then we could omit the any
simplifying it to:
transform(d.d, x = ave(pens, id, FUN = diff) < 0)
Note: The input in reproducible form is:
Lines <- "year id y.b sex income pens
2002 1 1950 F 100000 0
2002 2 1943 M 55000 1
2004 1 1950 F 88000 1
2004 2 1943 M 66000 0
2006 3 1966 M 12000 1
2008 3 1966 M 24000 1"
d.d <- read.table(text = Lines, header = TRUE, check.names = FALSE)