Search code examples
pythonpython-3.xpandasfor-loopanomaly-detection

Python print just lines that comply with if and for, anomaly detection for excessivity


I applied an anomaly detection algorithm on my data and now i want to find out how many times a certain value appears in the anomaly vector. Since i want to find anomalies of excessivity and my professor wont let me use statistic methods i'm trying to use an anomaly detection algorithm with 90% contamination and the ones that will be considered normal it will be the ones that appear the most and i will interpret it as those being the anomalies. It's just a test that will probably fail but to prove it i need to print how many times a certain value appears in the rows that are considered normal. Being the outlier the vector that tells me if they are normal or anomaly (1 or -1) and df5 my dataframe with the data. This is what i'm trying, i'm trying to do it for a single value because it seems simpler, but even at that i'm failing.

    value=1
    for i in range(len(outliers)):
        if outliers[i] == value:
            print(df5.loc[df5['actor']==931])

It devolves all the times that the feature 'actor' is 931 n times, and i want it to return the lines that the feature 'actor' is 931 when that row is considered normal. I've tried all the ways i know how.


Solution

  • I think I understood now what you are trying to do. The 931 got me confused for a while, now i think that you are just trying to filter all the 'normal' cases for which actor value is 931(its just a label). So if you print like this , you will select all the rows with label 931 EACH TIME. That's not what you want. So you want to filter out all the labels with actor == 931 and their corresponding outliers values first. Then simply do

    value=1
    for i in range(len(outliers)):
        if outliers[i] == value: #then normal not excessivity
            print (i) # you will get the indexes of the filtered array where actor == 931
            print(actual_index[i]) 
    

    you probably need actual index of original array, so store them when you filter the actor == 931 cases first. Now does this make any sense to you Mariana? Do let me know :)