Search code examples
rlistdataframerecode

Check the value of all rows in a column to see if it is in a list, return bool value, without for loop


I have a data frame column named as "occupation" with values 1, 2, 3, 5, 6, 7, 8, 9. I need to construct a new data frame column, say occupation2. Rows in the new column will take value 1 if the value in the old column belongs to one of the elements: 2,3,6,7. Otherwise, rows will take 0. In my real data, there are about 90 different values can be taken by the "occupation" column. Also, there are about 10 different values I need to use to assign 1 to the new column. So I don't want to create about 10 different conditions to assign the new value.

What I did is creating a list containing the value based on which I can dichotomize the new column, say value_list = c(2, 3, 6, 7). I also try to avoid using for loop to complete the task. A pseudo code would look like the following:

df$occupation2 <- 0 
value_list = c(2, 3, 6, 7)
df['occupation2'] <- 1 where occupation2's value isin value_list.  

Solution

  • df[['occupation2']] <- as.integer(df[['occupation']] %in% value_list) should work. %in% is the perfect operator for this job. It returns a logical (TRUE/FALSE) vector, which will be converted to 1/0 by as.integer().

    (Also, when extracting a single column from a data frame, either use data[, column] or data[[column]] to access the column directly - data[column] will give a 1-column data frame, rather than just the column)