Search code examples
rlapplydesctools

Winsorizing across all columns in a data frame (R) using `lapply`


I am trying to apply the Winsorize() function using lapply from the library(DescTools) package. What I currently have is;

data$col1 <- Winsorize(data$col1)

Which essentially replaces the extreme values with a value based on quantiles, replacing the below data as follows;

> data$col1
 [1]   -0.06775798   **-0.55213508**   -0.12338265
 [4]    0.04928349    **0.47524313**    0.04782829
 [7]   -0.05070639 **-112.67126382**    0.12657896
[10]   -0.12886632

> Winsorize(data$col1)
 [1] -0.06775798 **-0.37884540** -0.12338265  0.04928349
 [5]  **0.26038103**  0.04782829 -0.05070639 **-0.37884540**
 [9]  0.12657896 -0.12886632

I have a for loop which can do this across all columns of the data.frame col1, col2, col3, col4, however, I know lapply is a better option so I am trying to incorporate it into an lapply function but cannot seem to get it working. If anybody can point me in the right direction it would be much apreciated.

The data;

data <- structure(list(EQ.TA = c(-0.0677579847115102, -0.552135083517749, 
-0.123382654164705, 0.0492834931482554, 0.475243125304193, 0.0478282913638668, 
-0.050706389027946, -112.671263815473, 0.126578956975704, -0.128866322940619
), NI.EQ = c(3.64670235329765, 1.66115713369585, 0.209424623633739, 
0.340430636358184, -0.248411254566261, -12.1709277350516, 1.06888235737433, 
0.0515582237132515, 0.177323118521857, 0.419879195374698), NI.TA = c(-0.24709320230217, 
-0.917183132749265, -0.0258393659113752, 0.0167776109344148, 
-0.118055740980805, -0.582114677880617, -0.0541991646381309, 
-5.80913022585296, 0.0224453753901758, -0.0541082879872031), 
    TL.TA = c(1.06775798471151, 1.55213508351775, 1.12338265416471, 
    0.950716506851745, 0.524756874695807, 0.952171708636133, 
    1.05070638902795, 113.671263815473, 0.873421043024296, 1.12886632294062
    )), .Names = c("EQ.TA", "NI.EQ", "NI.TA", "TL.TA"), row.names = c(NA, 
10L), class = "data.frame")

Solution

  • You can lapply over the whole data.frame and reassign it like:

    library(DescTools)
    data[]<-lapply(data, Winsorize)
    
    data
    #          EQ.TA       NI.EQ       NI.TA      TL.TA
    #1   -0.06775798  2.75320700 -0.24709320  1.0677580
    #2   -0.55213508  1.66115713 -0.91718313  1.5521351
    #3   -0.12338265  0.20942462 -0.02583937  1.1233827
    #4    0.04928349  0.34043064  0.01677761  0.9507165
    #5    0.31834425 -0.24841125 -0.11805574  0.6816558
    #6    0.04782829 -6.80579532 -0.58211468  0.9521717
    #7   -0.05070639  1.06888236 -0.05419916  1.0507064
    #8  -62.21765589  0.05155822 -3.60775403 63.2176559
    #9    0.12657896  0.17732312  0.01989488  0.8734210
    #10  -0.12886632  0.41987920 -0.05410829  1.1288663