In R, suppose I have a vector like:
vector<-c("Red", " ", "", "5", "")
I want to count how many elements of this vector are just empty strings that only consist of either spaces or no spaces at all. For this very short vector, it is just three. The second, third, and fifth elements are just spaces or no spaces at all. They don't have any characters like letters, numbers, symbols, etc.
Is there any function or method that will count this? I wanted something I could use on larger vectors rather than just looking at every element of the vector.
Use sum(grepl())
plus an appropriate regular expression:
vector<-c("Red", " ", "", "5", "")
sum(grepl("^ *$", vector))
^
: beginning of string *
: zero or more spaces$
: end of stringIf you want to look for "white space" more generally (e.g. allowing tabs), use "^[[:space:]]*$"
, although as pointed out in ?grep
, the definition of white space is locale-dependent ...
length(grep(...))
would also work, or stringr::str_count(vector, "^ *$")
.
For what it's worth:
microbenchmark::microbenchmark(
bolker = sum(grepl("^ *$", vector)),
rudolph = sum(! nzchar(trimws(vector))),
baldur = sum(gsub(" ", "", vector, fixed = TRUE) == ""),
baldur2 = sum(! nzchar(gsub(" ", "", vector, fixed = TRUE))))
Unit: microseconds
expr min lq mean median uq max neval cld
bolker 10.499 10.8900 12.31869 11.8020 12.7990 40.976 100 a
rudolph 19.306 20.0125 22.01722 20.7990 22.9480 66.815 100 b
baldur 2.294 2.5700 2.76420 2.7455 2.8950 3.567 100 c
baldur2 2.294 2.4740 2.66267 2.6450 2.7755 5.130 100 c
(@RuiBarradas not included because vs similar to @KonradRudolph). I'm surprised that @s_baldur's answer is so fast ... but also probably worth keeping in mind that this operation will be fast enough to not worry about efficiency unless it is a large part of your overall workflow ...