I have a dataframe where the first column is a date/time series and the other 9 columns are sites with associated water temperature. I want to calculate the monthly summary stats for each column. The summary stats I want are, monthly mean with the +/- sd temp, monthly proportion of time > 20C & 23C, monthly maximum temperature. Here is a sample df of my data
# Create a, b, c, d variables
a <- c("06-25-20 08:00:00 AM","06-25-20 08:15:00 AM",
"06-25-20 08:30:00 AM","06-25-20 08:45:00 AM",
"07-25-20 08:45:00 AM", "07-25-20 08:45:00 AM",
"08-25-20 08:45:00 AM", "08-25-20 08:45:00 AM",
"09-25-20 08:45:00 AM","09-25-20 08:45:00 AM")
b <- c(4,5,8, "N/A", 4,5,"N/A",7,7,6)
c <- c(6,10,8, "N/A", 8,5,"N/A",8,7,2)
# Join the variables to create a data frame
df <- data.frame(a,b,c)
df$a = as.POSIXlt(df$a, format="%m-%d-%y%H:%M:%S", tz = 'EST')
I started just trying to get the monthly mean which works, but for the life of me I cannot get the table to also include Standard deviation, and all the other summary stats without making a bunch of extra code. Here is code I used for mean
Monthly_2020Temp = df %>%
group_by(a = format(as.Date(a), '%b-%Y')) %>%
summarise_each(funs( if(length(na.omit(.))>=15)
mean(., na.rm=TRUE) else NA_real_),
b:c)
Here is the code I tried to use when adding standard deviation
Monthly_2020Temp = df %>%
group_by(a = format(as.Date(a), '%b-%Y')) %>%
summarise_each(funs( if(length(na.omit(.))>=15)
mean(., na.rm=TRUE) else NA_real_), sd(., na.rm=TRUE) else NA_real_),
b:c)
but I get an error
Error: unexpected 'else' in:
" summarise_each(funs( if(length(na.omit(.))>=15)
mean(., na.rm=TRUE) else NA_real_), sd(., na.rm=TRUE) else"
> b:c)
Error: unexpected ')' in " b:c)"
Can someone help me make a nice table of summary stats for my time series data?
It's best to get the data into "long" format when using dplyr summarize. Here's one possible approach to get you started:
df$b <- as.numeric(df$b)
df$c <- as.numeric(df$c)
df %>% pivot_longer(-c(a)) %>%
mutate(month = lubridate::month(a), year = lubridate::year(a)) %>%
group_by(month, year) %>%
summarize(avg = mean(value, na.rm = TRUE), sd = sd(value, na.rm = TRUE))