I have a dataset where I want to select only one row for each individual each year - however, I would like to mutate a column so that if it says 'yes' for any of that persons rows then all the rows say 'yes'.
This is an example of the dataset I have:
So where the name, clinic and year are the same, I want the tested column to say 'yes' if any of the other rows for that grouping say 'yes'.
Therefore, this is what I would want the dataset to finally look like:
This is quite straightforward using dplyr
. Here is an option:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
df <- tribble(
~ name, ~ clinic, ~ year, ~ date, ~ tested,
"a", "xxy", 2022, "April", "yes",
"a", "xxy", 2022, "May", "no",
"b", "ggf", 2019, "Jan", "no",
"b", "ggf", 2019, "Feb", "yes",
"c", "ffr", 2018, "March", "yes",
"c", "ffr", 2019, "May", "no"
)
df |>
mutate(tested2 = if_else(any(tested == "yes"), "yes", "no"), .by = c(name, year))
#> # A tibble: 6 × 6
#> name clinic year date tested tested2
#> <chr> <chr> <dbl> <chr> <chr> <chr>
#> 1 a xxy 2022 April yes yes
#> 2 a xxy 2022 May no yes
#> 3 b ggf 2019 Jan no yes
#> 4 b ggf 2019 Feb yes yes
#> 5 c ffr 2018 March yes yes
#> 6 c ffr 2019 May no no
Created on 2024-02-25 with reprex v2.1.0
I would recommend to read this question before posting future questions. It makes easier to help you.