Suppose this is my dataset
State Enter Event Lag
State-A 2000 2004 -4
State-A 2001 2004 -3
State-A 2002 2004 -2
State-A 2003 2004 -1
State-A 2004 2004 0
State-A 2005 2004 1
State-A 2006 2004 2
State-A 2007 2004 3
State-A 2008 2004 4
State-A 2009 2004 5
State-B 2000 2004 -5
State-B 2001 2004 -4
State-B 2002 2004 -3
State-B 2003 2004 -2
State-B 2004 2004 -1
State-B 2005 2004 0
State-B 2006 2004 1
State-B 2007 2004 2
State-B 2008 2004 3
State-B 2009 2004 4
My goal is the create several indicator columns based on the values in column Lag
and user defined Event window.
If the user defined event window is between 2001 and 2008 then the expected dataset is to be like this
State Enter Event Lag LagM3_2001 LagM2_2002 LagM1_2003 Lead0_2004 Lead1_2005 Lead2_2006 Lead3_2007 Lead5_2008
State-A 2000 2004 -4 1 0 0 0 0 0 0 0
State-A 2001 2004 -3 1 0 0 0 0 0 0 0
State-A 2002 2004 -2 0 1 0 0 0 0 0 0
State-A 2003 2004 -1 0 0 1 0 0 0 0 0
State-A 2004 2004 0 0 0 0 1 0 0 0 0
State-A 2005 2004 1 0 0 0 0 1 0 0 0
State-A 2006 2004 2 0 0 0 0 0 1 0 0
State-A 2007 2004 3 0 0 0 0 0 0 1 0
State-A 2008 2004 4 0 0 0 0 0 0 0 1
State-A 2009 2004 5 0 0 0 0 0 0 0 1
State-B 2000 2004 -4 1 0 0 0 0 0 0 0
State-B 2001 2004 -3 1 0 0 0 0 0 0 0
State-B 2002 2004 -2 0 1 0 0 0 0 0 0
State-B 2003 2004 -1 0 0 1 0 0 0 0 0
State-B 2004 2004 0 0 0 0 1 0 0 0 0
State-B 2005 2004 1 0 0 0 0 1 0 0 0
State-B 2006 2004 2 0 0 0 0 0 1 0 0
State-B 2007 2004 3 0 0 0 0 0 0 1 0
State-B 2008 2004 4 0 0 0 0 0 0 0 1
State-B 2009 2004 5 0 0 0 0 0 0 0 1
Since the user defined event window is between 2001 and 2008.
First generate 2004 -2001 (3 lag columns)
, 2008 - 2004 (4 lead columns)
, 1 (Lead0) column .
Start with the lowest column which is LagM3_2001
Enter
column is less or equal
to (2004-3)=2001 is assigned value 1
So based on this logic, values in column LagM3_2001
is 1 when Enter
is 2001 or less(2000,1999) etc.I will jump to the highest column which is Lead5_2008
Enter
column is greater than or equal
to (2004+4)=2008 is assigned value 1
So based on this logic, values in column Lead5_2008
is 1 when Enter
is 2008 or higher(2008,2009) etc.This is the logic for any columns that are in between the lowest(LagM3_2001
) and highest (Lead5_2008
)
Values in Column (LagM2_2002
) is 1 if; 2004-2 = 2002
& value in Event
column is 2002, 0 everywhere else
Values in Column (LagM2_2003
) is 1 if; 2004-1 = 2003
& value in Event
column is 2001, 0 everywhere else
Values in Column (Lead0_2004
) is 1 if; 2004-0 = 2004
& value in Event
column is 2000, 0 everywhere else
Values in Column (Lead1_2005
) is 1 if; 2004+1 = 2005
& value in Event
column is 2005, 0 everywhere else
Values in Column (Lead1_2006
) is 1 if; 2004+2 = 2006
& value in Event
column is 2006, 0 everywhere else
Values in Column (Lead1_2007
) is 1 if; 2004+3 = 2007
& value in Event
column is 2007, 0 everywhere else
I could do this using lot of ifelse
statements, but if there is an efficient way to code this, I would much appreciate it. Thanks in advance.
pacman::p_load(tidyverse, glue)
create_eventdf <- function(df, start, end) {
df |>
mutate(Category = case_when(Enter <= start ~ glue("LagM{Event-start}_{start}"),
Enter >= end ~ glue("Lead{Event-Enter+1}_{end}"),
Enter < Event ~ glue("LagM{Event-Enter}_{Enter}"),
TRUE ~ glue("Lead{Event-Enter+1}_{Event}")))|>
pivot_wider( names_from = Category, values_from = Category, values_fn = length, values_fill = 0, unused_fn = NULL)
}
create_eventdf(df, 2001, 2008)
Output:
# A tibble: 20 × 13
State Enter Event Lag LagM3_2001 LagM2_2002 LagM1_2003 Lead1_2004
<chr> <int> <int> <int> <int> <int> <int> <int>
1 State-A 2000 2004 -4 1 0 0 0
2 State-A 2001 2004 -3 1 0 0 0
3 State-A 2002 2004 -2 0 1 0 0
4 State-A 2003 2004 -1 0 0 1 0
5 State-A 2004 2004 0 0 0 0 1
6 State-A 2005 2004 1 0 0 0 0
7 State-A 2006 2004 2 0 0 0 0
8 State-A 2007 2004 3 0 0 0 0
9 State-A 2008 2004 4 0 0 0 0
10 State-A 2009 2004 5 0 0 0 0
11 State-B 2000 2004 -5 1 0 0 0
12 State-B 2001 2004 -4 1 0 0 0
13 State-B 2002 2004 -3 0 1 0 0
14 State-B 2003 2004 -2 0 0 1 0
15 State-B 2004 2004 -1 0 0 0 1
16 State-B 2005 2004 0 0 0 0 0
17 State-B 2006 2004 1 0 0 0 0
18 State-B 2007 2004 2 0 0 0 0
19 State-B 2008 2004 3 0 0 0 0
20 State-B 2009 2004 4 0 0 0 0
# ℹ 5 more variables: Lead0_2004 <int>, `Lead-1_2004` <int>,
# `Lead-2_2004` <int>, `Lead-3_2008` <int>, `Lead-4_2008` <int>