Search code examples
rtexttidytext

Count only alphanumeric characters in a string


Given the string "This has 4 words!" I would like to count only the letters and digits. I would like to exclude whitespace and punctuation. As such, the string above should return 13.

I'm not sure why, but I cannot get this for R.


Solution

  • We can use [[:alnum:]] in str_count to count only the alphabets and digits

    library(stringr)
    str_count(str1, "[[:alnum:]]")
    #[1] 13
    

    Or in base R with gsub to remove the [[:punct:]] and then get the number of characters with nchar

    nchar(gsub("[[:punct:]]+", "", str1))
    

    Or negate (^) characters that are not alpha numeric, replace with blank ("") and get the nchar

    nchar(gsub("[^[:alnum:]]+", "", str1))
    #[1] 13
    

    data

    str1 <- "This has 4 words!"