I have a dataframe of daily observations dating from 1963-2022. I want to calculate the mean of the observation for each month. However, some months don't have data for each day and some only have one datapoint for one month. This skews some of the data points. How do I calculate how many observations have been used to calculate the mean for a given month.
structure(list(prcp_amt = c(0, 1.8, 6.4, 5.1, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 4.3, 0, 0, 0, 0, 4.6, 0, 0, 0, 0, 0, 0, 0, 0.3,
4.8, 0, 0, 4.1, 0, 0, 0, 0.3, 3.6, 6.6, 0, 0, 0, 0, 0, 0, 0.8,
0, 0, 0, 0, 0), ob_date = structure(c(-220838400, -220752000,
-220665600, -220579200, -220492800, -220406400, -220320000, -220233600,
-220147200, -220060800, -219974400, -219888000, -219801600, -219715200,
-219628800, -219542400, -219456000, -219369600, -219283200, -219196800,
-219110400, -219024000, -218937600, -218851200, -218764800, -218678400,
-218592000, -218505600, -218419200, -218332800, -218246400, -218160000,
-218073600, -217987200, -217900800, -217814400, -217728000, -217641600,
-217555200, -217468800, -217382400, -217296000, -217209600, -217123200,
-217036800, -216950400, -216864000, -216777600, -216691200, -216604800
), class = c("POSIXct", "POSIXt"), tzone = "GMT")), row.names = c(NA,
50L), class = "data.frame")
# historic monthly rainfall
rainHist$month <- as.numeric(format(rainHist$ob_date, '%m'))
rainHist$year <- as.numeric(format(rainHist$ob_date, '%Y'))
rainHistMean <- aggregate(prcp_amt ~ month + year, rainHist, FUN=mean)
rainHistMean$day <- 01
rainHistMean <-
rainHistMean %>%
mutate(rainHistMean, Date=paste(year, month, day, sep='-'))
rainHistMean[['Date']] <- as.POSIXct(rainHistMean[['Date']],
format='%Y-%m-%d',
tz='GMT'
)
rainHist$month <- as.numeric(format(rainHist$ob_date, '%m'))
rainHist$year <- as.numeric(format(rainHist$ob_date, '%Y'))
rainHistMean <- aggregate(prcp_amt ~ month + year, rainHist, FUN=function(x) c(mean(x), length(x)))
names(rainHistMean) <- c('month', 'year', 'prcp_amt', 'n')
How do I get there to be 4 columns not 3 with a matrix?
rainHist$month <- as.numeric(format(rainHist$ob_date, '%m'))
rainHist$year <- as.numeric(format(rainHist$ob_date, '%Y'))
rainHistMean <- aggregate(prcp_amt ~ month + year, rainHist, FUN=function(x) c(mean(x), length(x)))
rainHistMean <- data.frame(rainHistMean[1:2], rainHistMean[[3]])
names(rainHistMean) <- c('month', 'year', 'prcp_amt', 'n')
There may be more elegant solutions, but you can use dplyr
to group by month and year, then get the count and mean in summarize
:
df %>%
group_by(month(ob_date), year(ob_date)) %>%
summarize(mean_prcp = mean(prcp_amt),
count = n())
Output:
# # Groups: month(ob_date) [2]
# `month(ob_date)` `year(ob_date)` mean_prcp count
# <dbl> <dbl> <dbl> <int>
# 1 1 1963 0.91 30
# 2 2 1963 0.77 20