Search code examples
rtime-seriesoutliersimputationforecast

how to replace the value for one timestamp by the mean value of all the observations for this timestamp


When dealing with time series problems in R, I have multiple observations for one timestamp, how to replace the value for one timestamp by the mean value of all the observations for this timestamp and delete all the overlapped timestamp rows.

For example, I have a time series like this:

Date={2016-3-1, 2016-4-1, 2016-4-1, 2016-4-1,2016-5-1,2015-5-1, 2016-6-1).
Price={111,122,124,155,142,177,134}

My expected results are like this:

Date={2016-3-1,2016-4-1,2016-5-1,2017-6-1}, 
Price={111,133.67,159.5,134 }
(133.67=mean(122,124,155), 159.5=mean(142,177))

Solution

  • Is this what you're after?

    Date=c("2016-3-1", "2016-4-1", "2016-4-1", "2016-4-1","2016-5-1","2015-5-1", "2017-6-1")
    Price=c(111,122,124,155,142,177,134)
    
    library(dplyr)
    library(lubridate)
    df <- data.frame(Date = ymd(Date), Price = Price)
    df %>% 
      group_by(ymd(Date)) %>%
      summarise(mean = sprintf("%0.2f",mean(Price, na.rm = TRUE)))
    
    # # A tibble: 5 x 2
    #     `ymd(Date)` mean  
    #     <date>      <chr> 
    # 1 2015-05-01  177.00
    # 2 2016-03-01  111.00
    # 3 2016-04-01  133.67
    # 4 2016-05-01  142.00
    # 5 2017-06-01  134.00
    

    If you want baseR this too works:

    print(aggregate(list(Mean=df$Price), list(Date = df$Date), mean), digits = 5)
    
    #       Date   Mean
    # 1 2015-05-01 177.00
    # 2 2016-03-01 111.00
    # 3 2016-04-01 133.67
    # 4 2016-05-01 142.00
    # 5 2017-06-01 134.00