Search code examples
pythonpython-3.xpandasdata-sciencedata-analysis

Creating multiple quantile outputs based on condition of a value of a specific row


I have a df that has columns such as below:

   xx_id          interval_start_time                   interval_end_time      percentage_rate  
    bd63             2019-04-01 20:00:00.000                  2019-04-01 20:30:00.000       0.208   
    a519             2019-04-01 22:00:00.000                  2019-04-01 22:30:00.000       0.083   

I wanted to calc the percentiles for the percentage_rate column ( multiple percentiles like p5 p25 p50 p75 p90 ) based on the interval_start_time column. Something maybe where i can input the day and time for the interval_start_time column ( so I can do it for the different values in that column since that column contains different days and time intervals) and it would give me the percentiles or quantiles that I wanted?

Thanks for the time


Solution

  • df.groupby(df['interval_start_time'].dt.day)['percentage_rate'].apply(lambda x: np.percentile(x, [25,50,75,100]))
    

    You can change "day" to whatever your want your dates to group by. Make sure df['interval_start_time'] is a datetime column. You can also change the values in [] to display whatever percentiles that your interested in.