Search code examples
rdplyrstringrstringi

Getting unique count from a structured text data


I am wondering on how to get the unique number of characters from the text string from a structured dataset. This is a follow up question on my previous post. I would like to get a unique count of apples (coded as App), bananas (coded as Ban), pineapples (coded as Pin), grapes (coded as Grp)

    text<- c('AppPinAppBan', 'AppPinOra', 'AppPinGrpLonNYC')
    df<- data.frame(text)

   library(stringr)
   df$fruituniquecount<- str_count(df$A, "App|Ban|Pin|Grp")

   ## I am expecting output as follows:

      text           fruituniquecount
     AppPinAppBan     3
     AppPinOra        2
     AppPinGrpLonNYC  3

Solution

  • Following the same idea as the accepted answer at your previous question, then you can do,

    library(stringr)
    
    sapply(str_extract_all(df$text, "App|Ban|Pin|Grp"), function(i)length(unique(i)))
    #[1]3 2 3