Search code examples
rstringvectortextstringr

Count empty strings?


In R, suppose I have a vector like:

vector<-c("Red", "   ", "", "5", "")

I want to count how many elements of this vector are just empty strings that only consist of either spaces or no spaces at all. For this very short vector, it is just three. The second, third, and fifth elements are just spaces or no spaces at all. They don't have any characters like letters, numbers, symbols, etc.

Is there any function or method that will count this? I wanted something I could use on larger vectors rather than just looking at every element of the vector.


Solution

  • Use sum(grepl()) plus an appropriate regular expression:

    vector<-c("Red", "   ", "", "5", "")
    sum(grepl("^ *$", vector))
    
    • ^: beginning of string
    • *: zero or more spaces
    • $: end of string

    If you want to look for "white space" more generally (e.g. allowing tabs), use "^[[:space:]]*$", although as pointed out in ?grep, the definition of white space is locale-dependent ...

    length(grep(...)) would also work, or stringr::str_count(vector, "^ *$").

    For what it's worth:

     microbenchmark::microbenchmark(
         bolker =  sum(grepl("^ *$", vector)),
         rudolph = sum(! nzchar(trimws(vector))),
         baldur = sum(gsub(" ", "", vector, fixed = TRUE) == ""),
        baldur2 = sum(! nzchar(gsub(" ", "", vector, fixed = TRUE))))
    
    Unit: microseconds
        expr    min      lq     mean  median      uq    max neval cld
      bolker 10.499 10.8900 12.31869 11.8020 12.7990 40.976   100 a  
     rudolph 19.306 20.0125 22.01722 20.7990 22.9480 66.815   100  b 
      baldur  2.294  2.5700  2.76420  2.7455  2.8950  3.567   100   c
     baldur2  2.294  2.4740  2.66267  2.6450  2.7755  5.130   100   c
    

    (@RuiBarradas not included because vs similar to @KonradRudolph). I'm surprised that @s_baldur's answer is so fast ... but also probably worth keeping in mind that this operation will be fast enough to not worry about efficiency unless it is a large part of your overall workflow ...