Search code examples
rdplyrsummarize

Using dplyr summarise with conditions


I am currently trying to apply the summarise function in order to isolate the relevant observations from a large data set. A simple reproducible example is given here:

df <- data.frame(c(1,1,1,2,2,2,3,3,3), as.logical(c(TRUE,FALSE,TRUE,TRUE,TRUE,TRUE,FALSE,TRUE,FALSE)),
                 as.numeric(c(0,5,0,0,0,0,7,0,7)))
colnames(df) <- c("ID", "Status", "Price")

  ID Status Price
1  1   TRUE     0
2  1  FALSE     5
3  1   TRUE     0
4  2   TRUE     0
5  2   TRUE     0
6  2   TRUE     0
7  3  FALSE     7
8  3   TRUE     0
9  3  FALSE     7

I would like to sort the table by observation and get the status TRUE only if all three observations are TRUE (figured out) and then want to get the price corresponding to the status (i.e. 5 for observation 1 as FALSE, 0 for observation 2 as TRUE and 7 for observation 3 as FALSE).

From Summarize with conditions in dplyr I have figured out that I can - just as usually - specify the conditions in square brackets. My code so far thus looks like this:

library(dplyr)
result <- df %>%
  group_by(ID) %>%
  summarize(Status = all(Status), Test = ifelse(all(Status) == TRUE,
 first(Price[Status == TRUE]), first(Price[Status == FALSE]))) 

# This is what I get: 
# A tibble: 3 x 3
     ID Status  Test
  <dbl> <lgl>  <dbl>
1    1. FALSE     0.
2    2. TRUE      0.
3    3. FALSE     7.

But as you can see, for ID = 1 it gives an incorrect price. I have been trying this forever, so I would appreciate any hint as to where I have been going wrong.


Solution

  • We could keep the all(Status) as second argument in summarise (or change the column name) and also, it can be done with if/else as the logic seems to return a single TRUE/FALSE based on whether all of the 'Status' is TRUE or not

    df %>%
       group_by(ID) %>% 
       summarise( Test = if(all(Status)) first(Price[Status]) else 
                       first(Price[!Status]), Status = all(Status))
    # A tibble: 3 x 3
    #     ID  Test Status
    #   <dbl> <dbl> <lgl> 
    #1     1     5 FALSE 
    #2     2     0 TRUE  
    #3     3     7 FALSE 
    

    NOTE: It is better not to use ifelse with unequal lengths for its arguments