Search code examples
rstringirtweet

Count occurrences of multiple strings in one character variable


I have a dataset of tweets downloaded with rtweet. And i'd like to see how many times three different strings occur in the variable x$mentions_screen_name.

The key thing I'm trying to do is do a count of how many times 'A' occurs, then 'B', then 'C'. So my attempt at reproducing this is as follows.

#These are the strings I would like to count
var<-c('A', 'B', 'C')
#The variable that contains the strings looks like this
library(stringi)
df<-data.frame(var1=stri_rand_strings(100, length=3, '[A-C]'))
#How do I count how many cases contain A, then B and then C.?
library(purrr)
df%>% 
  map(var, grepl(., df$var1))

Solution

  • Another option using stringr and sapply could be:

    library(stringr)
    set.seed(1)
    df<-data.frame(var1=stri_rand_strings(100, length=3, '[A-C]'))
    
    var<-c('A', 'B', 'C')
    colSums(sapply(var, function(x,y)str_count(y, x), df$var1 ))
    #A   B   C 
    #101 109  90