Search code examples
rloopsif-statementlapplycreate-table

Create column with a certain condition in R


I have this data.

OPENING CLOSE 
2007     2008   
2009     2010    
2004      NA   

and I would like to make this column

OPENING CLOSE Y2004 Y2005 Y2006 Y2007 Y2008 Y2009 Y2010
2007     2008                     1     1
2005     2008         1     1     1     1                                   
2004      NA    1     1     1     1     1     1     1

It is possible to create this column step by step with if function, while I'd like to make loop or lapply function.

Additionally, I want to make this column(S~) using a certain condition.

If a column (Y2007) is 1 and the column 3 years ago is 1 (Y2005), The new column (S2007) is 1 and otherwise 0.

OPENING CLOSE Y2004 Y2005 Y2006 Y2007 Y2008 Y2009 Y2010 | S2007 S2008 S2009
2007     2008                     1     1               |   0     0     0
2005     2008         1     1     1     1               |   1     1     0
2004      NA    1     1     1     1     1     1     1   |   1     1     1

How do I make the script in R?


Solution

  • A solution from the tidyverse. dt3 is the first desired output, while dt5 is the second desired output. There is no need to use loops here.

    # Create example data frame
    dt <- read.table(text = "OPENING CLOSE 
    2007     2008   
                     2005     2008    
                     2004      NA   ",
                     header = TRUE, stringsAsFactors = FALSE)
    
    # Load package
    library(tidyverse)
    
    dt2 <- dt %>%
      mutate(ID = 1:n(), EndYear = ifelse(is.na(CLOSE), 2010, CLOSE)) %>%
      # Create year range list
      mutate(YearRange = map2(OPENING, EndYear, `:`)) %>%
      # Unnest the list column
      unnest() %>%
      mutate(YearRange = paste0("Y", YearRange)) %>%
      mutate(Value = 1) %>%
      # Spread based on YearRange and Value
      spread(YearRange, Value)
    
    # Desired output 1  
    dt3 <- dt2 %>%  
      arrange(ID) %>%
      select(-ID, -EndYear)
    
    dt4 <- dt2 %>%
      gather(YearRange, Value, Y2004:Y2010) %>%
      arrange(ID) %>%
      group_by(ID) %>%
      # Set the lag year here, using 3 years ago as an example
      mutate(Value2 = lag(Value, 2)) %>%
      # Evaluate the condition bewteen one year and 3 years ago
      mutate(Value3 = ifelse(Value %in% 1 & Value2 %in% 1, 1, 0)) %>%
      mutate(YearRange = sub("Y", "S", YearRange)) %>%
      select(ID, YearRange, Value3) %>%
      # Filter for S2007 o S2009
      filter(YearRange %in% paste0("S", 2007:2009)) %>%
      spread(YearRange, Value3)
    
    # Desired output 2
    dt5 <- dt2 %>%
      left_join(dt4, by = "ID") %>%
      arrange(ID) %>%
      select(-ID, -EndYear)