Search code examples
rcharacteruniquedistinct

Trying to extract/count the unique characters in a string (of class character)


Hi what I am trying to do is count the number of unique characters in a string. Here is what my dataframe looks like

Text            unique char count
banana              3
banana12            5
Ace@343             6

Upper/lower cases doesn't matter, what I am trying to get is unique chars(numbers, letters) in the output

I have tried unique, distinct functions etc however they provide the out for entire column within the column but I need it for each corresponding cell as shown above.


Solution

  • In base R you can do:

    df$char_count <- sapply(strsplit(df$Text, ""), function(x) length(unique(x)))
    
    df
    #>       Text char_count
    #> 1   banana          3
    #> 2 banana12          5
    #> 3  Ace@343          6
    

    Data

    df <- data.frame(Text = c("banana", "banana12", "Ace@343"))
    

    Created on 2021-11-12 by the reprex package (v2.0.0)