Search code examples
rcountfrequencymultipleselectioncrosstable

Frequency counts with a "multiple selection" variable


I have a dataframe with three variables:

  1. ID
  2. Gender (m,f,d)
  3. Medication (1,2,3,4,5,6,7,8,9,10)

For "Medication" multiple selections were possible in the questionnaire so now there are several numbers in each cell of the dataframe (separated by ";"). Is it possible to create a frequency table that counts each number (1-10) individually and have it grouped by gender? I need to have "Medication" (1-10) in rows and the counts grouped by gender as columns.

I already tried to split the columns so that each cell is left with only one value. But then I still have the problem to count the values across multiple columns. For each function I tried so far, the variables have been counted separately.

Thank you for your help!

Sorry, I don't know how to do it with dput. Here is a screenshot, I hope that helps! (Geschlecht = gender; Medikament = medication

And this is the expected output but without the combinations like "01;03"


Solution

  • Text parsing like this can be difficult without a specific, reproducible example. However, it sounds as though your data frame looks something like this:

    df
    #>    ID medication gender
    #> 1   1          9      f
    #> 2   2      2;1;3      m
    #> 3   3        6;2      d
    #> 4   4          3      f
    #> 5   5    7;8;7;1      f
    #> 6   6    6;9;4;6      m
    #> 7   7          9      d
    #> 8   8      8;6;3      f
    #> 9   9        9;7      d
    #> 10 10        8;6      m
    

    In which case, the pedestrian way to get your result in base R would be something like this:

    meds <- lapply(split(df, df$gender), 
                   function(x) unlist(strsplit(x$medication, ";\\s?")))
    
    genders <- rep(c("d", "f", "m"), times = lengths(meds))
    
    table(gender = genders, medication = unlist(meds))
    #>       medication
    #> gender 1 2 3 4 5 6 7 8 9 10
    #>      d 0 1 0 0 0 1 1 0 2  0
    #>      f 1 0 2 0 0 1 2 2 1  0
    #>      m 1 1 1 1 0 3 0 1 1  0
    

    Reproducible data

    set.seed(2)
    
    medication <- sapply(rpois(10, 2), function(x) {
      if(x == 0) x <- 1
      x <- sample(1:10, x, TRUE)
      paste(x, collapse = ";")
    })
    
    gender <- sample(c("m", "f", "d"), 10, TRUE, prob = c(2, 2, 1))
    
    df <- data.frame(ID = 1:10, medication = medication, gender = gender)
    

    Created on 2022-02-06 by the reprex package (v2.0.1)