I have a data frame with 5 columns, but I am interested in one column "Conditions". Within conditions column, I need to find a way to count the number of specific entries in a cell. Each column cell can have one entry or more than one entry separated by ( , ) . So my data frame looks something like
S.NO Conditions
11 Eye Color
12 Sound of your voice
13 Certain disease,Size of a palm,Eye Color
16 Eye Color,Hair color
17 Hair color,Height
18 Sound of your voice,Height
I want to count all the differnt entries/string at once. Total I have 35 list of different string in Conditions column and I want my Output something like this.
OUTPUT
Eye color Sound of your voice Certain disease Size of a palm Hair color Height
3 2 1 1 2 2
Since i do not know the exact structure of the data, i assumed the data to be as below
data <- tribble(
~Conditions, ~value,
'Eye color', '3',
'Sound of your voice', '2',
'Certain disease, Size of a palm, Eye color', '1,1,2',
'Eye color, Hair color', '2,2',
'Hair color, Height', '3,1',
'Sound of your voice, Height', '1,4'
)
for the above data we can write the below code to get the expected result
library(tidyverse)
Conditions <- unlist(strsplit(data$Conditions,','))
value <- unlist(strsplit(data$value,','))
df <- bind_cols(Conditions,value) %>% setNames(c('Conditions', 'value')) %>%
mutate(across(everything(), ~trimws(.x)), value=as.numeric(value)) %>%
arrange(Conditions) %>% group_by(Conditions) %>% slice_head(n=1) %>%
mutate(row=row_number()) %>%
pivot_wider(names_from = Conditions, values_from =value)
# A tibble: 1 × 7
row `Certain disease` `Eye color` `Hair color` Height `Size of a palm` `Sound of your voice`
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 3 2 1 1 2