Search code examples
pythonpandasdataframepercentile

Python Pandas Calculating Percentile per row


I have the following code and would like to create a new column per Transaction Number and Description that represents the 99th percentile of each row.

I am really struggling to achieve this - it seems that most posts cover calculating the percentile on the column.

Is there a way to achieve this? I would expect a new column to be create with two rows.

df_baseScenario = pd.DataFrame({'Transaction Number' : [1,10],
                            'Description'      :['asf','def'],
                            'Calc_PV_CF_2479.0':[4418494.085,-3706270.679],
                            'Calc_PV_CF_2480.0':[4415476.321,-3688327.494],
                            'Calc_PV_CF_2481.0':[4421698.198,-3712887.034],
                            'Calc_PV_CF_2482.0':[4420541.944,-3706402.147],
                            'Calc_PV_CF_2483.0':[4396063.863,-3717554.946],
                            'Calc_PV_CF_2484.0':[4397897.082,-3695272.043],
                            'Calc_PV_CF_2485.0':[4394773.762,-3724893.702],
                            'Calc_PV_CF_2486.0':[4384868.476,-3741759.048],
                            'Calc_PV_CF_2487.0':[4379614.337,-3717010.873],
                            'Calc_PV_CF_2488.0':[4389307.584,-3754514.639],
                            'Calc_PV_CF_2489.0':[4400699.929,-3741759.048],
                            'Calc_PV_CF_2490.0':[4379651.262,-3714723.435]})    

Solution

  • The following should work:

    df['99th_percentile'] = df[cols].apply(lambda x: numpy.percentile(x, 99), axis=1)

    I'm assuming here that the variable 'cols' contains a list of the columns you want to include in the percentile (You obviously can't use the Description in your calculation, for example).

    What this code does is loops over rows in the dataframe, and for each row, computes the numpy.percentile to get the 99th percentile. You'll need to import numpy.

    If you need maximum speed, then you can use numpy.vectorize to remove all loops at the expense of readability (untested):

    perc99 = np.vectorize(lambda x: numpy.percentile(x, 99))
    df['99th_percentile'] = perc99(df[cols].values)