Search code examples

R: smooth df values by columns

I have a df that I want to turn into a line graph. As you see below, the graphs have a lot of spikes, so I decided I should smooth the values for each variable.

Is there a way to replace each cell of a df with a 20-row average without using cycles?

Example data:

df = structure(list(Date = c(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 
100, 110, 120, 130, 140, 150, 160, 170, 180, 190), `0_3` = c(2.96069423175089, 
2.98934234468417, 3.0166710770045, 2.93318848451928, 2.9029582526956, 
2.93122886133033, 2.95467584624211, 2.92056074766355, 2.9673590504451, 
2.99909118448955, 3.0678648899907, 3.08758664146188, 3.16639741518578, 
3.1981536432575, 3.23886639676113, 3.32871012482663, 3.2554847841472, 
3.33575054387237, 3.25720703856234, 3.28495034377387), `0_6` = c(2.65441551812149, 
2.70340525084481, 2.75205080709182, 2.71591526344378, 2.76472214542438, 
2.73393461104848, 2.75387263339071, 2.77453271028037, 2.7299703264095, 
2.66585883065738, 2.69600247908274, 2.67800882167612, 2.7140549273021, 
2.63765248928454, 2.69905533063428, 2.66990291262136, 2.689313517339, 
2.75562001450326, 2.77049794084613, 2.78838808250573)), .Names = c("Date", 
"0_3", "0_6"), row.names = c(NA, 20L), class = "data.frame")

So far I can only smooth that with cycles:

smooth_factor = 5
smooth_df = df[smooth_factor:nrow(df),]
for (i in rownames(smooth_df)) {
  i = as.numeric(i)
  for (j in colnames(smooth_df)[2:ncol(smooth_df)]){
  # The first column contains Date that should not be smoothed
    smooth_percent[i,j] = mean(df[(i-smooth_factor):i,j])
smooth_df$Date = df$Date

This is what it looks like, if I apply this method to a bigger data set:

Noisy data with a lot of spikes

turns into

Data set smoothed by 20 steps


  • You are looking for apply (base) and rollmean (from the package zoo)

     npoints <- 5
          Date      0_3      0_6
     [1,]   20 2.960571 2.718102
     [2,]   30 2.954678 2.734006
     [3,]   40 2.947745 2.744099
     [4,]   50 2.928522 2.748595
     [5,]   60 2.935357 2.751406
     [6,]   70 2.954583 2.731634
     [7,]   80 2.981910 2.724047
     [8,]   90 3.008493 2.708875
     [9,]  100 3.057660 2.696779
    [10,]  110 3.103819 2.678316
    [11,]  120 3.151774 2.684955
    [12,]  130 3.203943 2.679735
    [13,]  140 3.237522 2.681996
    [14,]  150 3.271393 2.690309
    [15,]  160 3.283204 2.716878
    [16,]  170 3.292421 2.734744

    with data.table it would be:

    df[, lapply(.SD,function(x){rollmean(x,npoints)}),.SDcols = names(df)]