Search code examples
rif-statementdummy-variableany

R: Generate a dummy variable based on the existence of one column' value in another column


I have a data frame like this:

A                    B          
2012,2013,2014     2011
2012,2013,2014     2012
2012,2013,2014     2013
2012,2013,2014     2014
2012,2013,2014     2015

I wanted to create a dummy variable, which indicates whether the value in column B exists in column A. 1 indicates the existence, and 0 indicates non-existant. Such that,

A                    B       dummy        
2012,2013,2014     2011        0
2012,2013,2014     2012        1
2012,2013,2014     2013        1
2012,2013,2014     2014        1
2012,2013,2014     2015        0

I have tried to use %in% to achieve this:

df$dummy <- ifelse(df$B %in% df$A, 1, 0)

but it turned out that everything in the column of dummy is 1.

Same situation happened when I tried to use another method any():

df$dummy <- any(df$A==df$B)

everything in the column of dummy is TRUE.

Is there an efficient way to generate this dummy variable?

Many thanks!


Solution

  • It looks like column A is a string of numbers separated by commas, so %in% would not be appropriate (it would be helpful if, for example, you checked for B inside a vector of multiple strings, or numbers if A and B were numeric). If your data frame structure is different, please let me know (and feel free to edit your question).

    You probably could accomplish this multiple ways. Perhaps an easy way is to use grepl one row at a time to identify if column B is present in A.

    library(tidyverse)
    
    df %>%
      rowwise() %>%
      mutate(dummy = +grepl(B, A))
    

    Output

    # A tibble: 5 x 3
      A              B     dummy
      <fct>          <fct> <int>
    1 2012,2013,2014 2011      0
    2 2012,2013,2014 2012      1
    3 2012,2013,2014 2013      1
    4 2012,2013,2014 2014      1
    5 2012,2013,2014 2015      0
    

    Data

    df <- data.frame(
      A = c(rep("2012,2013,2014", 5)),
      B = c("2011", "2012", "2013", "2014", "2015")
    )