Search code examples
pythonpandasdataframefilteriqr

how to delete single values based on IQR filtering from dataframe


I have a dataframe with around 80 columns and a few hundreds of rows, below is an example dataframe. I need to filter the dataframe based on the IQR value then delete the outliers but not the whole row, only the actual value/cell. As far I could define the Q1, Q3 and IQR.

I have found some solutions but they remove the whole row but I want to delete just the outlier value/cell.

list1 = [-1200, 0.75, 85, 92, 115, 116, 227, 252, 271, 308, 314, 443, 509, 717, 793, 805, 891, 903, 1958, 2978]
list2 = [-903, 0.68, 90, 109, 129, 164, 351, 365, 440, 472, 538, 547, 591, 679, 682, 706, 776, 785, 787, 2886]
list3 = [-1123, 0.033, 73, 182, 296, 355, 438, 509, 528, 530, 567, 608, 613, 632, 653, 727, 745, 761, 780, 1985]
list4 = [-1328, 0.363, 89, 106, 131, 165, 171, 202, 335, 339, 379, 399, 425, 488, 513, 657, 728, 761, 840, 1844]

df = pd.DataFrame ({'A' : list1, 'B' : list2, 'C' : list3, 'D' : list4})
Q1 = df.quantile(0.25)
Q3 = df.quantile(0.75)
IQR = Q3 - Q1

So the output dataframe should look like this:

output dataframe


Solution

  • You can try something like:

    df[(df < Q1 - 1.5*IQR) | (df>Q3 + 1.5*IQR)] = pd.NA      
    

    To set all the outliers to NA.