I have this data.
OPENING CLOSE
2007 2008
2009 2010
2004 NA
and I would like to make this column
OPENING CLOSE Y2004 Y2005 Y2006 Y2007 Y2008 Y2009 Y2010
2007 2008 1 1
2005 2008 1 1 1 1
2004 NA 1 1 1 1 1 1 1
It is possible to create this column step by step with if function, while I'd like to make loop or lapply function.
Additionally, I want to make this column(S~) using a certain condition.
If a column (Y2007) is 1 and the column 3 years ago is 1 (Y2005), The new column (S2007) is 1 and otherwise 0.
OPENING CLOSE Y2004 Y2005 Y2006 Y2007 Y2008 Y2009 Y2010 | S2007 S2008 S2009
2007 2008 1 1 | 0 0 0
2005 2008 1 1 1 1 | 1 1 0
2004 NA 1 1 1 1 1 1 1 | 1 1 1
How do I make the script in R?
A solution from the tidyverse
. dt3
is the first desired output, while dt5
is the second desired output. There is no need to use loops
here.
# Create example data frame
dt <- read.table(text = "OPENING CLOSE
2007 2008
2005 2008
2004 NA ",
header = TRUE, stringsAsFactors = FALSE)
# Load package
library(tidyverse)
dt2 <- dt %>%
mutate(ID = 1:n(), EndYear = ifelse(is.na(CLOSE), 2010, CLOSE)) %>%
# Create year range list
mutate(YearRange = map2(OPENING, EndYear, `:`)) %>%
# Unnest the list column
unnest() %>%
mutate(YearRange = paste0("Y", YearRange)) %>%
mutate(Value = 1) %>%
# Spread based on YearRange and Value
spread(YearRange, Value)
# Desired output 1
dt3 <- dt2 %>%
arrange(ID) %>%
select(-ID, -EndYear)
dt4 <- dt2 %>%
gather(YearRange, Value, Y2004:Y2010) %>%
arrange(ID) %>%
group_by(ID) %>%
# Set the lag year here, using 3 years ago as an example
mutate(Value2 = lag(Value, 2)) %>%
# Evaluate the condition bewteen one year and 3 years ago
mutate(Value3 = ifelse(Value %in% 1 & Value2 %in% 1, 1, 0)) %>%
mutate(YearRange = sub("Y", "S", YearRange)) %>%
select(ID, YearRange, Value3) %>%
# Filter for S2007 o S2009
filter(YearRange %in% paste0("S", 2007:2009)) %>%
spread(YearRange, Value3)
# Desired output 2
dt5 <- dt2 %>%
left_join(dt4, by = "ID") %>%
arrange(ID) %>%
select(-ID, -EndYear)