Search code examples
rstr-replacegreplrecode

Recode multiple columns based on grep column label outcome in R


I have multiple columns which I'd like to recode (Yes - 1, No - 0, Null - NA) in R. These columns all contain the word 'Flag' as part of the column labels and I have about 60 of these columns to recode. I've also got column names ending with other words which I am hoping to apply the same logic and recode them in batches.

Here's an example of my data frame. I imported it as .csv file to RStudio and set stringsAsFactors = F

> test <- data.frame(ID = c("86224AA8", "911D8EF", "959661A0", "A4935669", "9A77218A", "19884814",  "017E5338", "6DBCFBB"), CreatedDate = c("18/11/2015", "18/12/2015", "15/11/2015", "13/11/2015", "08/09/2015", "07/11/2013", "18/11/2015", "18/11/2015"), V2Flag = c("No", "No", "No", "No", "Yes", "Yes", "NULL", "Yes"), V3Flag = c("Yes", "NULL", "Yes", "No", "Yes", "Yes", "NULL", "Yes"), V4Flag = c("No", "NULL", "Yes", "No", "Yes", "No", "NULL", "No"))
> test
         ID CreatedDate V2Flag V3Flag V4Flag
1 86224AA8  18/11/2015     No    Yes     No
2  911D8EF  18/12/2015     No   NULL   NULL
3 959661A0  15/11/2015     No    Yes    Yes
4 A4935669  13/11/2015     No     No     No
5 9A77218A  08/09/2015    Yes    Yes    Yes
6 19884814  07/11/2013    Yes    Yes     No
7 017E5338  18/11/2015   NULL   NULL   NULL
8  6DBCFBB  18/11/2015    Yes    Yes     No

Here's my attempt to recode the Yes/ No responses in column names ending 'Flag' in R.

> test[, grepl("Flag", names(test)) == 'No'] <- 0 
> test[, grepl("Flag", names(test)) == 'Yes'] <- 1 
> test[, grepl("Flag", names(test)) == 'NULL'] <- NA

These lines ran fine in R and didn't return any errors. However as shown on the global environment these columns still doesn't display Yes/No outputs as 1/0.

If I first subset my dataset using 'grepl', store the selected columns ending 'Flag' in a separate dataframe. I didn't have any issues with recoding the binary response.

Can you suggest what has gone wrong in my code, and how I can select columns based on their names and recode them (without subsetting my dataframe)?

Thank you!


Solution

  • Here's an approach to recode No. Repeat for others.

    #Convert columns 3, 4, and 5 to character
    #This may or may not be necessary for your actual data
    test[,3:5] = lapply(test[,3:5], as.character)
    
    #Obtain column numbers where 'Flag' is present
    ind1 = which(grepl("Flag", names(test)))
    
    #Obtain indices of where the values are 'No'
    ind2 = which(test == "No", arr.ind = TRUE)
    
    #Keep only those values in ind2 where column numbers match with ind1
    ind2 = ind2[ind2[,2] %in% ind1,]
    
    #Recode values to zero
    test[ind2] = 0