Search code examples
rfunctionrecode

Function to convert set of categorical variables to single vector


There are many posts about creating dummy variables, but in my case I have a set of columns similar to dummy variables which need recoding back into one column.

Given as set of categorical/string variables (counties in the USA):

a<-c(NA,NA,"Cameron","Luzerne");b<-c(NA,"Luzerne",NA,NA);c<-c("Chester",NA,NA,NA)
df<-as.data.frame(cbind(a,b,c))

How to create a function that can convert them to a single category? The function should work for any contiguous set of string columns.

Result should look like this:

newcol    a           b          c
Chester   <NA>        <NA>       Chester
Luzerne   <NA>        Luzerne    <NA>
Cameron   Cameron    <NA>        <NA>
Luzerne   <NA>        Luzerne    <NA>

I wrote this function, which takes three arguments:

cn<-function(df,s,f){
  for(i in seq_along(df[ ,c(s:f)]) )  # for specified columns in a dataframe...
  ifelse(is.na(df[,i]),NA,df[ ,i] )   # return value if not NA
  }

But it doesn't work. I've tried a variety of similar attempts. Fail.

The idea is to take a data frame with some number of string columns and move their values, if not blank, to the new column.


Solution

  • We can use coalesce

    library(dplyr)
    df %>%
        mutate_all(as.character) %>%
        mutate(newcolumn = coalesce(!!! .)) %>%
        select(newcolumn, everything())
    #   newcolumn       a       b       c
    #1   Chester    <NA>    <NA> Chester
    #2   Luzerne    <NA> Luzerne    <NA>
    #3   Cameron Cameron    <NA>    <NA>
    #4   Luzerne Luzerne    <NA>    <NA>
    

    In base R, an option is pmax

    do.call(pmax, c(lapply(df, as.character), na.rm = TRUE))
    #[1] "Chester" "Luzerne" "Cameron" "Luzerne"