When dealing with time series problems in R, I have multiple observations for one timestamp, how to replace the value for one timestamp by the mean value of all the observations for this timestamp and delete all the overlapped timestamp rows.
For example, I have a time series like this:
Date={2016-3-1, 2016-4-1, 2016-4-1, 2016-4-1,2016-5-1,2015-5-1, 2016-6-1).
Price={111,122,124,155,142,177,134}
My expected results are like this:
Date={2016-3-1,2016-4-1,2016-5-1,2017-6-1},
Price={111,133.67,159.5,134 }
(133.67=mean(122,124,155), 159.5=mean(142,177))
Is this what you're after?
Date=c("2016-3-1", "2016-4-1", "2016-4-1", "2016-4-1","2016-5-1","2015-5-1", "2017-6-1")
Price=c(111,122,124,155,142,177,134)
library(dplyr)
library(lubridate)
df <- data.frame(Date = ymd(Date), Price = Price)
df %>%
group_by(ymd(Date)) %>%
summarise(mean = sprintf("%0.2f",mean(Price, na.rm = TRUE)))
# # A tibble: 5 x 2
# `ymd(Date)` mean
# <date> <chr>
# 1 2015-05-01 177.00
# 2 2016-03-01 111.00
# 3 2016-04-01 133.67
# 4 2016-05-01 142.00
# 5 2017-06-01 134.00
If you want baseR
this too works:
print(aggregate(list(Mean=df$Price), list(Date = df$Date), mean), digits = 5)
# Date Mean
# 1 2015-05-01 177.00
# 2 2016-03-01 111.00
# 3 2016-04-01 133.67
# 4 2016-05-01 142.00
# 5 2017-06-01 134.00