Search code examples
rgrepl

Count the frequency of strings in a dataframe R


I am wanting to count the frequencies of certain strings within a dataframe.

strings  <- c("pi","pie","piece","pin","pinned","post")
df <- as.data.frame(strings)

I would then like to count the frequency of the strings:

counts <- c("pi", "in", "pie", "ie")

To give me something like:

string  freq
 pi       5
 in       2
 pie      2
 ie       2

I have experimented with grepl and table but I don't see how I can specify the strings I want to search for are.


Solution

  • You can use sapply() to go the counts and match every item in counts against the strings column in df using grepl() this will return a logical vector (TRUE if match, FALSE if non-match). You can sum this vector up to get the number of matches.

    sapply(df, function(x) {
      sapply(counts, function(y) {
        sum(grepl(y, x))
      })
    })
    

    This will return:

        strings
    pi        5
    in        2
    pie       2
    ie        2