Search code examples
rfilterloess

Apply LOESS filter / regression to all columns of my data frame


I have a data frame containing 100 columns of numerical values, where each column is a different circular shift of the first column. I need to put a LOESS filter on those columns one-by-one. In my context the covariate is simple, just the index 1, 2, 3, ..., <number of rows>.

How can I obtain the smoothed values in a new data frame? Thank you!


Solution

  • Assume your data frame is called dat, you can do:

    ## response
    vars <- colnames(dat)
    ## covariate
    id <- 1:nrow(dat)
    ## define a loess filter function (fitting loess regression line)
    loess.filter <- function (x, span) loess(formula = paste(x, "id", sep = "~"),
                                             data = dat,
                                             degree = 1,
                                             span = span)$fitted 
    ## apply filter column-by-column
    new.dat <- as.data.frame(lapply(vars, loess.filter, span = 0.75),
                             col.names = colnames(dat))
    

    The function loess.filter is based on R built-in function loess. Have a look at ?loess if you have never used it. Here, we have used the following function arguments:

    • formula: we generate the formula on the fly;
    • span: this controls the smoothing parameter.

    We use lapply to apply loess column-by-column, retaining only fitted / smoothed values. If you have never used lapply before, have a read on ?lapply.

    We can customize span. You can compare:

    as.data.frame(lapply(vars, loess.filter, span = 1),
                  col.names = colnames(dat))
    as.data.frame(lapply(vars, loess.filter, span = 0.75),
                  col.names = colnames(dat))
    as.data.frame(lapply(vars, loess.filter, span = 0.5),
                  col.names = colnames(dat))
    

    As we choose gradually smaller span, the result is getting closer to original data. But surely, it is also getting more and more jagged.


    Here is a small example, using span = 0.75.

    ## example data
    set.seed(0); dat <- as.data.frame(replicate(3, rnorm(10)))
    colnames(dat) <- paste0("var", 1:ncol(dat))
    

    Original data:

    > dat
              var1        var2        var3
    1   1.68382474 -1.74121307  2.71648728
    2  -0.68325574  1.23062681  0.04827926
    3   0.50518377  0.28811377  0.01184018
    4   0.04106266 -0.85230469 -0.28150053
    5   0.19244324  0.25739150 -0.03539714
    6  -0.31722642 -1.36826320 -0.68331669
    7   1.48740413 -0.05923145  2.13633374
    8   0.63805589 -0.70888114 -0.83978457
    9   1.42104234  0.75622827  0.83117970
    10 -0.55051748 -1.65601708  0.41827418
    

    After applying my code:

    > new.dat
             var1       var2        var3
    1  0.85647777 -0.5045655  1.76600194
    2  0.56284689 -0.3124571  1.05971504
    3  0.26893906 -0.1369094  0.39435505
    4  0.09054923 -0.1186259 -0.15040237
    5  0.18381641 -0.4725185 -0.04259514
    6  0.40755479 -0.4982544  0.23026628
    7  0.67075652 -0.4481397  0.30250611
    8  0.64421508 -0.4552548  0.41389728
    9  0.48725209 -0.5845782  0.44169083
    10 0.27764338 -0.7238709  0.44952801