Search code examples
rmutate

Group and then mutate


I have a dataset where I want to select only one row for each individual each year - however, I would like to mutate a column so that if it says 'yes' for any of that persons rows then all the rows say 'yes'.

This is an example of the dataset I have:

enter image description here

So where the name, clinic and year are the same, I want the tested column to say 'yes' if any of the other rows for that grouping say 'yes'.

Therefore, this is what I would want the dataset to finally look like:

enter image description here


Solution

  • This is quite straightforward using dplyr. Here is an option:

    library(dplyr)
    #> 
    #> Attaching package: 'dplyr'
    #> The following objects are masked from 'package:stats':
    #> 
    #>     filter, lag
    #> The following objects are masked from 'package:base':
    #> 
    #>     intersect, setdiff, setequal, union
    
    df <- tribble(
      ~ name, ~ clinic, ~ year, ~ date, ~ tested,
      "a",       "xxy",    2022,  "April", "yes",
      "a",       "xxy",    2022,  "May", "no",
      "b",       "ggf",    2019,  "Jan", "no",
      "b",       "ggf",    2019,  "Feb", "yes",
      "c",       "ffr",    2018,  "March", "yes",
      "c",       "ffr",    2019,  "May", "no"
    )
    
    df |> 
      mutate(tested2 = if_else(any(tested == "yes"), "yes", "no"), .by = c(name, year))
    #> # A tibble: 6 × 6
    #>   name  clinic  year date  tested tested2
    #>   <chr> <chr>  <dbl> <chr> <chr>  <chr>  
    #> 1 a     xxy     2022 April yes    yes    
    #> 2 a     xxy     2022 May   no     yes    
    #> 3 b     ggf     2019 Jan   no     yes    
    #> 4 b     ggf     2019 Feb   yes    yes    
    #> 5 c     ffr     2018 March yes    yes    
    #> 6 c     ffr     2019 May   no     no
    

    Created on 2024-02-25 with reprex v2.1.0

    I would recommend to read this question before posting future questions. It makes easier to help you.