Replace column values based on percentiles in python

I have grouped a dataframe and I would like, per group, to replace the values in some columns if they or smaller than some percentile. So all values within a group that are larger than the 0.95 percentile should be replaced by the 0.95 percentile and all the values that are smaller than the 0.05 percentile should be replaced by the 0.05 percentile.
The dataframe could look like this(example taken from another question): Two groups: ‘one’ and ‘two’

    A           B           C
0   0.719391    0.091693    one
1   0.951499    0.83716     one
2   0.975212    0.224855    one
3   0.80762     0.031284    one
4   0.63319     0.342889    one
5   0.075102    0.899291    one
6   0.502843    0.773424    one
7   0.032285    0.242476    one
8   0.794938    0.607745    one
9   0.620387    0.574222    one
10  0.446639    0.549749    two
11  0.664324    0.134041    two
12  0.622217    0.505057    two
13  0.670338    0.99087     two
14  0.281431    0.016245    two
15  0.675756    0.185967    two
16  0.145147    0.045686    two
17  0.404413    0.191482    two
18  0.94913     0.943509    two
19  0.164642    0.157013    two

The result for this dataframe should be:

A               B           C
0   0.719391    0.091693    one
1   0.951499    0.83716     one
2   0.96454115  0.224855    one
3   0.80762     0.05846805  one
4   0.63319     0.342889    one
5   0.075102    0.87133205  one
6   0.502843    0.773424    one
7   0.05155265  0.242476    one
8   0.794938    0.607745    one
9   0.620387    0.574222    one
10  0.446639    0.549749    two
11  0.664324    0.134041    two
12  0.622217    0.505057    two
13  0.670338    0.96955755  two
14  0.281431    0.02949345  two
15  0.675756    0.185967    two
16  0.15391975  0.045686    two
17  0.404413    0.191482    two
18  0.8261117   0.943509    two
19  0.164642    0.157013    two

Notice that for column A rows 2, 7, 16 and 18 have been replaced; and for column B rows 3, 5, 13, and 14 have been replaced.

Does anyone knows how to do this in an efficient way for a large dataframe?

Thanks

Solution

You could use groupby + quantile + clip:

g = df.groupby('C').transform(lambda x: x.clip(*x.quantile([0.05, 0.95])))
g['C'] = df['C']

           A         B    C
0   0.719391  0.091693  one
1   0.951499  0.837160  one
2   0.964541  0.224855  one
3   0.807620  0.058468  one
4   0.633190  0.342889  one
5   0.075102  0.871332  one
6   0.502843  0.773424  one
7   0.051553  0.242476  one
8   0.794938  0.607745  one
9   0.620387  0.574222  one
10  0.446639  0.549749  two
11  0.664324  0.134041  two
12  0.622217  0.505057  two
13  0.670338  0.969558  two
14  0.281431  0.029493  two
15  0.675756  0.185967  two
16  0.153920  0.045686  two
17  0.404413  0.191482  two
18  0.826112  0.943509  two
19  0.164642  0.157013  two

Sanity Check

np.allclose(e[['A', 'B']].values, g[['A', 'B']].values)
True

Here, e is the output in your question.