Search code examples
rhierarchical-clusteringsegment

Segment bin string with 'which' statement in R


I have tried lots of algorithms on my dataset to perform a clustering and now would love to apply now managerial segmentation with 'which'statements on my data. I was wondering what might makes more sense if I shall do the segment on customer math or on the Years which are lasting from X1-X8. Doing managerial segmentation on X1-X8 is clear, but I don't know how to do it on the string.

Here is my df:

   customer_id customer_math X1 X2 X3 X4 X5 X6 X7 X8
1   15251       10001010      1  0  0  0  1  0  1  0
2   10101       11111111      1  1  1  1  1  1  1  1
3   84787       10101010      1  0  1  0  1  0  1  0

For instance, I would like to answer the following questions:

  1. Customers who had once a "zero" in it
  2. Customers who had twice in a row a "zero" in it
  3. Customers who left and came back for instance" --> at least one zero in the string and on ending string 1.

Thank you very much for your feedback!


Solution

  • If I understood correctly:

    library(stringr)
    q1 <- df[str_count(df$customer_math, "0")==1,]            #exactly one '0' occurrence in string
    q2 <- df[grepl("00",df$customer_math),]                   #at least two zeros ina a row - or more, be aware of it, this is simple solution and it won't get only exact 00 occurences, but you can fix it easly^^
    q3 <- df[str_count(df$customer_math, "0")>=1 & df$X8==1,] #at least one zero in string and always 1 at the end