Search code examples
rstring-comparisonr-factor

Count number of factor levels that match a character string in R


I have a factor with several levels, and I am trying to obtain the number of levels in which the factor levels contains a given string.

Given this factor:

exdata <- factor(c("Test1","Test2","Sample1","Sample2","Test1","Test2","Sample3"))

I want to find number of levels in exdata containing "Sample" or "Test."

My solution thus far has been to use nlevels, droplevels, and grep:

nlevels(droplevels(exdata[grep("Test",exdata)]))
# Correct/intended answer is 2
nlevels(droplevels(exdata[grep("Sample",exdata)]))
# Correct/intended answer is 3

Is there a more concise way to do this?


Solution

  • Use the levels()

    levels(exdata)
    # [1] "Sample1" "Sample2" "Sample3" "Test1"   "Test2"  
    

    So you can do two individual calls ...

    length(grep("Sample", levels(exdata), fixed = TRUE))
    # [1] 3
    length(grep("Test", levels(exdata), fixed = TRUE))
    # [1] 2
    

    Or in one go ...

    f <- function(x) length(grep(x, levels(exdata), fixed = TRUE))
    sapply(c("Sample", "Test"), f)
    # Sample   Test 
    #      3      2