Search code examples
python-3.xpandasmachine-learningscikit-learnunsupervised-learning

Is there a way using unsupervised method of scikit learn to classify some list into different groups?


I have a number of instances, and each instances has it's own list which represents different steps that it follows. For example :

1284 -> [0, 100, 200, 100, 200, 300, 600]
1285 -> [0, 100, 200, 100, 200, 300, 500, 999]
1286 -> [0, 100, 200, 300, 600]
...
13023 -> [0, 100, 170, 100, 200]

And for example, the instance 1284 go through the steps 0 to 600 like that

0 -> 100
100 -> 200
200 -> 100
100 -> 200
200 -> 300
300 -> 100

I have managed to get the list of the path of each instance but I want to find instances with loops and classify them. For example the instance 1284 go through the steps 100 and 200 two times.

I would like to know how to do that. I thought of unsupervised classification with scikit learn, but I'm not familiar with it and I don't know how to classify those lists.

Some help would be really appreciated. Thx!


Solution

  • I think you can use the following trick to do this without any machine learning

    1. Change the list of step into a set
    2. Now compare the size of the set to size of the original steps
    3. If the size is same then there were all distinct stepse
    4. Else there was a loop

    I based this algorithm on the assumption that if there are no loops then all steps will be distinct.

    list_1284 = [0, 100, 200, 100, 200, 300, 600]
    
    set_1284 = set(list_1284)
    
    if len(set_1284) != len(list_1284):
       print "There exists a loop"
    
    else:
       print "No loop exists"