Search code examples
pythonlistrange

Python: Find outliers inside a list


I'm having a list with a random amount of integers and/or floats. What I'm trying to achieve is to find the exceptions inside my numbers (hoping to use the right words to explain this). For example:

list = [1, 3, 2, 14, 108, 2, 1, 8, 97, 1, 4, 3, 5]
  • 90 to 99% of my integer values are between 1 and 20
  • sometimes there are values that are much higher, let's say somewhere around 100 or 1.000 or even more

My problem is, that these values can be different all the time. Maybe the regular range is somewhere between 1.000 to 1.200 and the exceptions are in the range of half a million.

Is there a function to filter out these special numbers?


Solution

  • Assuming your list is l:

    • If you know you want to filter a certain percentile/quantile, you can use:

      This removes bottom 10% and top 90%. Of course, you can change any of them to your desired cut-off (for example you can remove the bottom filter and only filter the top 90% in your example):

      import numpy as np
      l = np.array(l)
      l = l[(l>np.quantile(l,0.1)) & (l<np.quantile(l,0.9))].tolist()
      

      output:

      [ 3  2 14  2  8  4  3  5]
      
    • If you are not sure of the percentile cut-off and are looking to remove outliers:

      You can adjust your cut-off for outliers by adjusting argument m in function call. The larger it is, the less outliers are removed. This function seems to be more robust to various types of outliers compared to other outlier removal techniques.

       import numpy as np 
       l = np.array(l) 
       def reject_outliers(data, m=6.):
          d = np.abs(data - np.median(data))
          mdev = np.median(d)
          s = d / (mdev if mdev else 1.)
          return data[s < m].tolist()
       print(reject_outliers(l))
      

      output:

      [1, 3, 2, 14, 2, 1, 8, 1, 4, 3, 5]