Search code examples
rif-statementnesteddplyrsummarize

ifelse() nested statements in summarize function in dplyr R


I am trying to summarise a dataframe based on grouping by label column. I want to obtain means based on the following conditions: - if all numbers are NA - then I want to return NA - if mean of all the numbers is 1 or lower - I want to return 1 - if mean of all the numbers is higher than 1 - I want a mean of the values in the group that are greater than 1 - all the rest should be 100.

Managed to find the answer and now my code is running well - is.na() should be there instead of ==NA in the first ifelse() statement and that was the issue.

label <- c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,7,7,7)
sev <- c(NA,NA,NA,NA,1,0,1,1,1,NA,1,2,2,4,5,1,0,1,1,4,5)
Data2 <- data.frame(label,sev)

d <- Data2 %>%
        group_by(label) %>%
        summarize(sevmean = ifelse(is.na(mean(sev,na.rm=TRUE)),NA,
                                 ifelse(mean(sev,na.rm=TRUE)<=1,1,
                                        ifelse(mean(sev,na.rm=TRUE)>1,
                                               mean(sev[sev>1],na.rm=TRUE),100))))

Solution

  • Your first condition is the issue here. If we remove the nested ifelse and keep only the first one, we get the same output

    Data2 %>%
       group_by(label) %>%
       summarise(sevmean = ifelse(mean(sev,na.rm=TRUE)==NaN,NA,1))
    
    #  label sevmean
    #  <dbl> <lgl>  
    #1  1.00 NA     
    #2  2.00 NA     
    #3  3.00 NA     
    #4  4.00 NA     
    #5  5.00 NA     
    #6  6.00 NA     
    #7  7.00 NA     
    

    I am not sure why you are checking NaN but if you want to do that , check it with is.nan instead of ==

    Data2 %>%
      group_by(label) %>%
       summarize(sevmean = ifelse(is.nan(mean(sev,na.rm=TRUE)),NA,
                             ifelse(mean(sev,na.rm=TRUE)<=1,1,
                                    ifelse(mean(sev,na.rm=TRUE)>1,
                                           mean(sev[sev>1],na.rm=TRUE),100))))
    
    
    #  label sevmean
    #  <dbl>   <dbl>
    #1  1.00    NA   
    #2  2.00    1.00
    #3  3.00    1.00
    #4  4.00    2.00
    #5  5.00    3.67
    #6  6.00    1.00
    #7  7.00    4.50