I have a dataset with outcome variable - owl weight[y] and time in years when the outcome was recorded[t]. There is a third variable what type of feed was provided to the baby owls[w] and 4th column what year the new feed was introduced[a].
y t w a
-1.354 1994 Zi 2015
2.146 1994 Zi 2015
3.146 2001 Zi 2015
2.646 2011 Zi 2015
-2.354 2015 Zi 2015
-0.854 2016 Zi 2015
-3.854 2017 Zi 2015
-2.354 2021 Zi 2015
-2.354 2001 Ca 2010
1.146 2002 Ca 2010
-2.354 2003 Ca 2010
-0.854 2005 Ca 2010
-6.354 2011 Ca 2010
2.146 2019 Ca 2010
Step1 : I like to create a time to event variable
column from the index a
column, like this, t-a
=time to event variable
y t w a TimeToEvent
-1.354 1994 Zi 2015 -21
2.146 1994 Zi 2015 -21
3.146 2001 Zi 2015 -14
2.646 2011 Zi 2015 -4
-2.354 2015 Zi 2015 0
-0.854 2016 Zi 2015 1
-3.854 2017 Zi 2015 2
-2.354 2021 Zi 2015 6
-2.354 2001 Ca 2010 -9
1.146 2002 Ca 2010 -8
-2.354 2003 Ca 2010 -7
-0.854 2005 Ca 2010 -5
1.147 2010 Ca 2010 0
-6.354 2011 Ca 2010 1
2.146 2019 Ca 2010 9
Step2 : I like to create 4 lag
and 4 lead
variables like this
y t w a TimeToEvent Lg4 Lg3 Lg2 Lg1 Index Ld1 Ld2 Ld3 Ld4
-1.354 1994 Zi 2015 -21 1 0 0 0 0 0 0 0 0
2.146 1998 Zi 2015 -17 1 0 0 0 0 0 0 0 0
-3.146 2001 Zi 2015 -14 1 0 0 0 0 0 0 0 0
2.646 2011 Zi 2015 -4 1 0 0 0 0 0 0 0 0
0.133 2013 Zi 2015 -2 0 0 1 0 0 0 0 0 0
-2.354 2015 Zi 2015 0 0 0 0 0 1 0 0 0 0
-0.854 2016 Zi 2015 1 0 0 0 0 0 1 0 0 0
-3.854 2017 Zi 2015 2 0 0 0 0 0 0 1 0 0
-2.354 2021 Zi 2015 6 0 0 0 0 0 0 0 0 1
-2.354 2001 Ca 2010 -9 1 0 0 0 0 0 0 0 0
1.146 2002 Ca 2010 -8 1 0 0 0 0 0 0 0 0
-2.354 2003 Ca 2010 -7 1 0 0 0 0 0 0 0 0
-0.854 2005 Ca 2010 -5 1 0 0 0 0 0 0 0 0
1.147 2010 Ca 2010 0 0 0 0 0 1 0 0 0 0
-6.354 2011 Ca 2010 1 0 0 0 0 0 1 0 0 0
2.146 2019 Ca 2010 9 0 0 0 0 0 0 0 0 1
As you can see, I am only interested in tracking the lead and lag for 4 time periods before the index and after the index year [column a].
y t w a TimeToEvent Lg4 Lg3 Lg2 Lg1 Index Ld1 Ld2 Ld3 Ld4
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
2.646 2011 Zi 2015 -4 1 0 0 0 0 0 0 0 0
0.133 2013 Zi 2015 -2 0 0 1 0 0 0 0 0 0
-2.354 2015 Zi 2015 0 0 0 0 0 1 0 0 0 0
-0.854 2016 Zi 2015 1 0 0 0 0 0 1 0 0 0
-3.854 2017 Zi 2015 2 0 0 0 0 0 0 1 0 0
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
1.147 2010 Ca 2010 0 0 0 0 0 1 0 0 0 0
-6.354 2011 Ca 2010 1 0 0 0 0 0 1 0 0 0
Any observations occurring outside the 4 lead or 4 lag periods are assigned 1 depending on whether they are t < a or t > a for example the first three observations time to event
is -12,-17 and -14 so these observations will be assigned 1 in Lg4 column (boundary) similarly first 4 observations of group(wa)=Ca is -9,-8,-7,-5 so they are assigned 1 in the in Lg4 column
y t w a TimeToEvent Lg4 Lg3 Lg2 Lg1 Index Ld1 Ld2 Ld3 Ld4
-1.354 1994 Zi 2015 -21 1 0 0 0 0 0 0 0 0
2.146 1998 Zi 2015 -17 1 0 0 0 0 0 0 0 0
-3.146 2001 Zi 2015 -14 1 0 0 0 0 0 0 0 0
-2.354 2001 Ca 2010 -9 1 0 0 0 0 0 0 0 0
1.146 2002 Ca 2010 -8 1 0 0 0 0 0 0 0 0
-2.354 2003 Ca 2010 -7 1 0 0 0 0 0 0 0 0
-0.854 2005 Ca 2010 -5 1 0 0 0 0 0 0 0 0
Observations with time to event
outside lead will be assigned 1 on the other end( column Ld4)
y t w a TimeToEvent Lg4 Lg3 Lg2 Lg1 Index Ld1 Ld2 Ld3 Ld4
-2.354 2021 Zi 2015 6 0 0 0 0 0 0 0 0 1
2.146 2019 Ca 2010 9 0 0 0 0 0 0 0 0 1
Final Expected dataset as shown in Step2 above
y t w a TimeToEvent Lg4 Lg3 Lg2 Lg1 Index Ld1 Ld2 Ld3 Ld4
-1.354 1994 Zi 2015 -21 1 0 0 0 0 0 0 0 0
2.146 1998 Zi 2015 -17 1 0 0 0 0 0 0 0 0
-3.146 2001 Zi 2015 -14 1 0 0 0 0 0 0 0 0
2.646 2011 Zi 2015 -4 1 0 0 0 0 0 0 0 0
0.133 2013 Zi 2015 -2 0 0 1 0 0 0 0 0 0
-2.354 2015 Zi 2015 0 0 0 0 0 1 0 0 0 0
-0.854 2016 Zi 2015 1 0 0 0 0 0 1 0 0 0
-3.854 2017 Zi 2015 2 0 0 0 0 0 0 1 0 0
-2.354 2021 Zi 2015 6 0 0 0 0 0 0 0 0 1
-2.354 2001 Ca 2010 -9 1 0 0 0 0 0 0 0 0
1.146 2002 Ca 2010 -8 1 0 0 0 0 0 0 0 0
-2.354 2003 Ca 2010 -7 1 0 0 0 0 0 0 0 0
-0.854 2005 Ca 2010 -5 1 0 0 0 0 0 0 0 0
1.147 2010 Ca 2010 0 0 0 0 0 1 0 0 0 0
-6.354 2011 Ca 2010 1 0 0 0 0 0 1 0 0 0
2.146 2019 Ca 2010 9 0 0 0 0 0 0 0 0 1
Any suggestions regarding this is much appreciated. Thanks.
It seems that you are only testing upon your time to event so a bunch of ifelse
will do the trick.
If you want to replace missing values there, I suggest you use dplyr::across()
to apply tidyr::replace_na()
to all those columns. You could also use dplyr::case_when()
instead of ifelse()
.
Also, for your future questions, please try to provide the dataset in the form of pasteable code such as in my answer (or even better using dput()
).
Here is the code:
library(tidyverse)
df=read.table(header=T, text="
y t w a
-1.354 1994 Zi 2015
2.146 1994 Zi 2015
3.146 2001 Zi 2015
2.646 2011 Zi 2015
-2.354 2015 Zi 2015
-0.854 2016 Zi 2015
-3.854 2017 Zi 2015
-2.354 2021 Zi 2015
-2.354 2001 Ca 2010
1.146 2002 Ca 2010
-2.354 2003 Ca 2010
-0.854 2005 Ca NA
-6.354 2011 Ca 2010
2.146 2019 Ca 2010")
df %>%
mutate(
tte=t-a,
Lg4=ifelse(tte<=-4, 1, 0),
Lg3=ifelse(tte==-3, 1, 0),
Lg2=ifelse(tte==-2, 1, 0),
Lg1=ifelse(tte==-1, 1, 0),
Index=ifelse(tte==0, 1, 0),
Ld1=ifelse(tte==1, 1, 0),
Ld2=ifelse(tte==2, 1, 0),
Ld3=ifelse(tte==3, 1, 0),
Ld4=ifelse(tte>=4, 1, 0),
across(Lg4:Ld4, replace_na, replace=0)
)
#> y t w a tte Lg4 Lg3 Lg2 Lg1 Index Ld1 Ld2 Ld3 Ld4
#> 1 -1.354 1994 Zi 2015 -21 1 0 0 0 0 0 0 0 0
#> 2 2.146 1994 Zi 2015 -21 1 0 0 0 0 0 0 0 0
#> 3 3.146 2001 Zi 2015 -14 1 0 0 0 0 0 0 0 0
#> 4 2.646 2011 Zi 2015 -4 1 0 0 0 0 0 0 0 0
#> 5 -2.354 2015 Zi 2015 0 0 0 0 0 1 0 0 0 0
#> 6 -0.854 2016 Zi 2015 1 0 0 0 0 0 1 0 0 0
#> 7 -3.854 2017 Zi 2015 2 0 0 0 0 0 0 1 0 0
#> 8 -2.354 2021 Zi 2015 6 0 0 0 0 0 0 0 0 1
#> 9 -2.354 2001 Ca 2010 -9 1 0 0 0 0 0 0 0 0
#> 10 1.146 2002 Ca 2010 -8 1 0 0 0 0 0 0 0 0
#> 11 -2.354 2003 Ca 2010 -7 1 0 0 0 0 0 0 0 0
#> 12 -0.854 2005 Ca NA 0 0 0 0 0 0 0 0 0 0
#> 13 -6.354 2011 Ca 2010 1 0 0 0 0 0 1 0 0 0
#> 14 2.146 2019 Ca 2010 9 0 0 0 0 0 0 0 0 1
Created on 2021-12-27 by the reprex package (v2.0.1)