Search code examples
rstringdataframecountunique

Count a list of specific strings in an R data frame


I have a data frame with 5 columns, but I am interested in one column "Conditions". Within conditions column, I need to find a way to count the number of specific entries in a cell. Each column cell can have one entry or more than one entry separated by ( , ) . So my data frame looks something like

S.NO                   Conditions
11            Eye Color 
12            Sound of your voice
13            Certain disease,Size of a palm,Eye Color
16            Eye Color,Hair color
17            Hair color,Height
18            Sound of your voice,Height

I want to count all the differnt entries/string at once. Total I have 35 list of different string in Conditions column and I want my Output something like this.

OUTPUT

Eye color   Sound of your voice   Certain disease    Size of a palm    Hair color   Height
    3           2                      1                   1              2          2

Solution

  • Since i do not know the exact structure of the data, i assumed the data to be as below

    data

    data <- tribble(
    ~Conditions, ~value,
    'Eye color', '3',
    'Sound of your voice', '2',
    'Certain disease, Size of a palm, Eye color', '1,1,2',
    'Eye color, Hair color', '2,2',
    'Hair color, Height', '3,1',
    'Sound of your voice, Height', '1,4'
    )
    

    for the above data we can write the below code to get the expected result

    code

    library(tidyverse)
    
    Conditions <- unlist(strsplit(data$Conditions,','))
    value <- unlist(strsplit(data$value,','))
    
    
    df <- bind_cols(Conditions,value) %>% setNames(c('Conditions', 'value')) %>% 
    mutate(across(everything(), ~trimws(.x)), value=as.numeric(value)) %>% 
      arrange(Conditions) %>% group_by(Conditions) %>% slice_head(n=1) %>% 
    mutate(row=row_number()) %>% 
      pivot_wider(names_from = Conditions, values_from =value)
    

    output

    # A tibble: 1 × 7
        row `Certain disease` `Eye color` `Hair color` Height `Size of a palm` `Sound of your voice`
      <int>             <dbl>       <dbl>        <dbl>  <dbl>            <dbl>                 <dbl>
    1     1                 1           3            2      1                1                     2