Search code examples
rdplyrcoalesce

Putting the content of many columns in a new single column


Thanks for the answers to my previous question, but I need to update it, since the solutions don't work with my real life example, i.e., a 3170x11 dataframe.

Let me briefly recapitulate it. I have a 3170X11 dataframe, filled with the terms 'Normale', 'Delezioni' or NA. I would like to coalesce the column results in a new columns, reporting the type of term reported, that is indeed 'Normale', 'Delezioni' or 'NA'. In case 'Normale' and 'NA' are present on the same row, it should be reported 'Normale'. In case 'Delezioni' and 'NA' are present on the same row, it should be reported 'Delezioni'. In case only 'NA' are present should be reported 'NA'. However in case both 'Normali' and 'Delezioni' are present, it should be reported 'Error'. Akrun and others reported a nice solution (Coalescing many columns into one column), but, as I said, doesn't work when things become bigger:

library (RCurl)
a <- getURL('http://download1645.mediafire.com/pp9z3okh5tgg/96px8ophovxrxe9/example.tab')
df2 <- read.table(text=a,header=TRUE, sep = "\t")
df2 <- data.frame(lapply(df2, as.character), stringsAsFactors=FALSE) #converts from factor to character
res <- df2 %>%
   mutate_if(~ all(is.na(.)) && is.logical(.), ~ NA_character_) %>%
   transmute(Summary = case_when(rowSums(!is.na(.)) > 1 ~ "Error",
            TRUE ~ coalesce(!!! .)))

res contains several mistakes. For instance the first lines should be:

  Summary
1   Normale
2    <NA>
3    <NA>
4    <NA>
5   Normale
6   Normale

Instead they are:

> head (res)
  Summary
1   Error
2    <NA>
3    <NA>
4    <NA>
5   Error
6   Error 

Thanks


Solution

  • I think you can define a simple function which works based on your requirement

    apply_fun <- function(x) {
      if(all(c("Delezioni","Normale") %in% x)) return('Error')
      if("Delezioni" %in% x) return('Delezioni')
      if("Normale" %in% x)  return('Normale')
      else NA
    }
    

    and then apply it row-wise

    example$answer <- apply(example, 1, apply_fun)
    head(example$answer)
    #[1] "Normale" NA        NA        NA        "Normale" "Normale"
    

    If needed a tidyverse/dplyr answer, we can convert these multiple if statements to case_when and then use pmap

    library(tidyverse)
    
    apply_fun <- function(x) {
      case_when(all(c("Delezioni","Normale") %in% x) ~ "Error", 
                "Delezioni" %in% x ~ "Delezioni", 
                "Normale" %in% x ~ "Normale", 
                TRUE ~NA_character_)
    }
    
    output <- example %>% mutate(answer = pmap_chr(., ~apply_fun(c(...))))