Search code examples
rpercentiler-factor

How to apply a function for each level of a factor variable?


I have a function like this:

remove_outliers<-function(x){
qnt<- quantile(x,probs=0.99)
y<- x
y[x>qnt]<- NA
y}

The purpose is to remove outliers that are at the top 1% of the data (replace their value with NA). How can I apply this function across levels of a factor variable?

For example,

An original dataset with group A and B:

group share
A     100
A     50
A     30
A     10
...   ...
B     100
B     90
B     80
B     60
...   ...

Should end up like this:

group share
A     NA
A     50
A     30
A     10
...   ...
B     NA
B     90
B     80
B     60
...   ...

I already tried by, tapply, sapply, but these all change the structure of the dataset output.


Solution

  • Have a look at ? ave, it does exactly what you are looking for:

    remove_outliers<-function(x){
      qnt<- quantile( x,probs=0.99 )
      x[ x>qnt ]<- NA
      return(x)
    }
    
    # assuming your data.frame is called mdf
    mdf$fixed <- ave( mdf$share, mdf$group, FUN = remove_outliers )
    
    mdf
      group share fixed
    1     A   100    NA
    2     A    50    50
    3     A    30    30
    4     A    10    10
    5     B   100    NA
    6     B    90    90
    7     B    80    80
    8     B    60    60