Search code examples
pythondatetrim

How to trim outliers in dates in python?


I have a dataframe df:

0    2003-01-02
1    2015-10-31
2    2015-11-01
16   2015-11-02
33   2015-11-03
44   2015-11-04

and I want to trim the outliers in the dates. So in this example I want to delete the row with the date 2003-01-02. Or in bigger data frames I want to delete the dates who do not lie in the interval where 95% or 99% lie. Is there a function who can do this ?


Solution

  • You could use quantile() on Series or DataFrame.

    dates = [datetime.date(2003,1,2),
             datetime.date(2015,10,31),
             datetime.date(2015,11,1),
             datetime.date(2015,11,2),
             datetime.date(2015,11,3),
             datetime.date(2015,11,4)]
    df = pd.DataFrame({'DATE': [pd.Timestamp(x) for x in dates]})
    print(df)
    
    qa = df['DATE'].quantile(0.1) #lower 10%
    qb = df['DATE'].quantile(0.9) #higher 10%
    
    print(qa, qb)
    
    #remove outliers
    xf = df[(df['DATE'] >= qa) & (df['DATE'] <= qb)]
    print(xf)
    

    The output is:

            DATE
    0 2003-01-02
    1 2015-10-31
    2 2015-11-01
    3 2015-11-02
    4 2015-11-03
    5 2015-11-04
    2009-06-01 12:00:00 2015-11-03 12:00:00
            DATE
    1 2015-10-31
    2 2015-11-01
    3 2015-11-02
    4 2015-11-03