Search code examples
rvariablesdata-cleaning

How to delete certain condition from data frame


Lets say this is my df :

people <- c(1,1,1,2,2,3,3,4,4,5,5)
activity <- c(1,1,1,2,2,3,4,5,5,6,6)
completion <- c(0,0,1,0,1,1,1,0,0,0,1)

And I would like to remove all people that never completed any activity.

I have tried this code, but somehow it does not work. I have no idea what could be wrong here.

nevercompleted<- df %>% 
  filter(completion != 0) %>% 
  group_by(people) %>% 
  summarise("frequency activity" = n())

df<- -c (df$nevercompleted)

So, in this scenario person 4 should be removed from the df. Note that I am only intrested in removing those that never completed anything like person 4, not person 1 who at one point completes an activity.


Solution

  • 1. Base R

    In base R, the following can easily be rewritten as a one-liner.

    i <- ave(as.logical(df$completion), df$people, FUN = function(x) any(x != 0, na.rm = TRUE))
    df <- df[which(i), ]
    df
    #   people activity completion
    #1       1        1          0
    #2       1        1          0
    #3       1        1          1
    #4       2        2          0
    #5       2        2          1
    #6       3        3          1
    #7       3        4          1
    #10      5        6          0
    #11      5        6          1
    

    2. Package dplyr

    And here is a dplyr way.

    First filter only people that have completed an activity, then join with the original data set in order to get all columns.

    df <- df %>%
      group_by(people) %>%
      summarise(completion = any(as.logical(completion))) %>%
      filter(completion) %>%
      select(-completion) %>%
      left_join(df, by = 'people')
    
    df
    #`summarise()` ungrouping output (override with `.groups` argument)
    ## A tibble: 9 x 3
    #  people activity completion
    #   <dbl>    <dbl>      <dbl>
    #1      1        1          0
    #2      1        1          0
    #3      1        1          1
    #4      2        2          0
    #5      2        2          1
    #6      3        3          1
    #7      3        4          1
    #8      5        6          0
    #9      5        6          1
    

    Data

    In the question there is no data.frame instruction, only the creation of the column vectors.

    people <- c(1,1,1,2,2,3,3,4,4,5,5)
    activity <- c(1,1,1,2,2,3,4,5,5,6,6)
    completion <- c(0,0,1,0,1,1,1,0,0,0,1)
    df <- data.frame(people, activity, completion)