I have a data frame (x)
. As you could see, I have different values (bp: yes/no and N) for the same ID, but sometimes, I have a unique ID with always bp ==no.
x
ID bp N
8012199 no 75
8012199 yes 2
8012211 no 118
8012211 yes 2
8012321 no 37
8012341 no 146
8012342 no 84
I would like to create a new variable (frequency: low and high
)
x
ID bp N frequency
8012199 no 75
8012199 yes 2 low
8012211 no 118
8012211 yes 4 high
8012321 no 37 low
8012341 no 146 low
8012342 no 84 low
based on different conditions:
What I've tried so far is
r <- x %>%
mutate(bp= case_when(
duplicated(ID)==FALSE ~ "low",
bp %in% "yes" & n <=2 ~ "low",
bp %in% "yes" & n > 2 ~ "high" ))
duplicated(x$ID)
FALSE TRUE FALSE TRUE FALSE FALSE FALSE
It works effectively for the "yes" condition, but not for the duplication. Any advice would be fantastic!
Your two examples are slightly different - N in row 4 has changed.
I've used the second example data:
x <- tibble::tribble(
~ID, ~bp, ~N,
"8012199", "no", 75,
"8012199", "yes", 2,
"8012211", "no", 118,
"8012211", "yes", 4,
"8012321", "no", 37,
"8012341", "no", 146,
"8012342", "no", 84
)
Here's a solution using the tidyverse:
library(tidyverse)
x %>%
group_by(ID) %>%
mutate(
id_count = n(),
frequency = case_when(
bp == "yes" & N <=2 ~ "low",
bp == "yes" & N > 2 ~ "high",
bp == "no" & id_count == 1 ~ "low",
TRUE ~ "")
) %>%
ungroup() %>%
select(-id_count)
Basically, the id_count
variable calculates the number of rows for each ID number. This is then used to generate the low
frequency results.
Also, notice that I use N <= 2 to represent "low" frequency. I've based this on the example you've included.