I have a dataframe with three variables:
For "Medication" multiple selections were possible in the questionnaire so now there are several numbers in each cell of the dataframe (separated by ";"). Is it possible to create a frequency table that counts each number (1-10) individually and have it grouped by gender? I need to have "Medication" (1-10) in rows and the counts grouped by gender as columns.
I already tried to split the columns so that each cell is left with only one value. But then I still have the problem to count the values across multiple columns. For each function I tried so far, the variables have been counted separately.
Thank you for your help!
And this is the expected output but without the combinations like "01;03"
Text parsing like this can be difficult without a specific, reproducible example. However, it sounds as though your data frame looks something like this:
df
#> ID medication gender
#> 1 1 9 f
#> 2 2 2;1;3 m
#> 3 3 6;2 d
#> 4 4 3 f
#> 5 5 7;8;7;1 f
#> 6 6 6;9;4;6 m
#> 7 7 9 d
#> 8 8 8;6;3 f
#> 9 9 9;7 d
#> 10 10 8;6 m
In which case, the pedestrian way to get your result in base R would be something like this:
meds <- lapply(split(df, df$gender),
function(x) unlist(strsplit(x$medication, ";\\s?")))
genders <- rep(c("d", "f", "m"), times = lengths(meds))
table(gender = genders, medication = unlist(meds))
#> medication
#> gender 1 2 3 4 5 6 7 8 9 10
#> d 0 1 0 0 0 1 1 0 2 0
#> f 1 0 2 0 0 1 2 2 1 0
#> m 1 1 1 1 0 3 0 1 1 0
Reproducible data
set.seed(2)
medication <- sapply(rpois(10, 2), function(x) {
if(x == 0) x <- 1
x <- sample(1:10, x, TRUE)
paste(x, collapse = ";")
})
gender <- sample(c("m", "f", "d"), 10, TRUE, prob = c(2, 2, 1))
df <- data.frame(ID = 1:10, medication = medication, gender = gender)
Created on 2022-02-06 by the reprex package (v2.0.1)