Search code examples
rsplitcharacterfrequency-analysis

How to split characters and calculate the corresponding frequency in R


Here is my data:

 [1] NA                                              NA                                             
 [3] NA                                              "EP, IP, RA, SH"
 [5] "EO, EP"                                        NA 

I split the data using:

da$name<-str_split(da$name,",")

and the data become:

[[1]]
[1] NA

[[2]]
[1] NA

[[3]]
[1] NA

[[4]]
[1] "EP"  " IP" " RA"  " SH"

[[5]]
[1] "EO" " EP"         

[[6]]
[1] NA

and I want to calculate the frequency of NA,"EP","IP","RA","SH" and "EO"

Is there a possible way of doing that?


Solution

  • Probably not the best or more elegant way of doing it, but a possible solution is to unlist your strsplit result in order to make it a vector of all individual values and then to count for each different values:

    df <- data.frame(Vec = c(NA,NA,NA,"EP, IP, RA, SH","EO, EP",NA))
    
    vec <- unlist(strsplit(as.character(df$Vec),","))
    
    library(dplyr)
    as.data.frame(vec) %>% count(vec)
    
    # A tibble: 7 x 2
      vec       n
      <fct> <int>
    1 " EP"     1
    2 " IP"     1
    3 " RA"     1
    4 " SH"     1
    5 "EO"      1
    6 "EP"      1
    7  NA       4
    

    Does it answer your question ?