Search code examples
runique

Extracting number of unique observations using a string


I have a column like this:

data.frame(x = c("ABC1","ABD1","ABE1","ABF1","ABG1","ABC2","ABC2","ABF2","ABE2"))

I want to find out how many unique observations there are which contain "AB" and a letter. So ABC1 and ABC2 are not unique but ABC1 and ABD1 are.

In this example, there would be 5 unique observations.


Solution

  • You can select only the first 3 characters for each word. Then count the number of unique occurrences.

    df = data.frame(x = c("ABC1","ABD1","ABE1","ABF1","ABG1","ABC2","ABC2","ABF2","ABE2"),stringsAsFactors = FALSE)
    
    length(unique(substr(df$x,1,3)))
    5