I would like to calculate and plot changing numbers of differently colored animals over time using dplyr and ggplot2.
I have observations of different animals on random dates and so I would first like to group those observations into 4-day brackets and then calculate mean color for each 4-day bracket. I created the column Bracket.mean with a gimmick result for the first few just to show what I have in mind. I would like to add those means in the same data frame (as opposed to creating a new data.frame or vectors) for a later analysis and plotting, if possible.
And for the plot I’m hoping to show the bracket means with some measure of variance around it (SD or boxplots) as well as the daily observations (perhaps a faded overlay of the observations in the background) over time.
Below is a part of the dataset I'm using (with a made up 'Bracket.mean' column I’m hoping to calulcate). 'Count' is the number of animals on a given 'Date' of a specific 'Color'.
Date Julian Count Color Bracket.color
4/19/16 110 1 50 mean of 4/19-4/22
4/19/16 110 1 50 mean of 4/19-4/22
4/19/16 110 1 100 mean of 4/19-4/22
4/20/16 111 4 50 mean of 4/19-4/22
4/20/16 111 1 0 mean of 4/19-4/22
4/20/16 111 2 100 mean of 4/19-4/22
4/20/16 111 1 50 mean of 4/19-4/22
4/20/16 111 2 100 mean of 4/19-4/22
4/21/16 112 1 100 mean of 4/19-4/22
4/21/16 112 2 50 mean of 4/19-4/22
4/21/16 112 4 50 mean of 4/19-4/22
4/21/16 112 1 100 mean of 4/19-4/22
4/21/16 112 2 50 mean of 4/19-4/22
4/21/16 112 1 0 mean of 4/19-4/22
4/22/16 113 2 0 mean of 4/19-4/22
4/22/16 113 4 50 mean of 4/23-4/26
4/23/16 114 6 0 mean of 4/23-4/26
4/23/16 114 1 50 mean of 4/23-4/26
4/24/16 115 2 0 mean of 4/23-4/26
4/26/16 117 5 0 mean of 4/23-4/26
4/30/16 121 1 50
5/2/16 123 1 NA
5/2/16 123 1 50
5/7/16 128 2 0
5/7/16 128 3 0
5/7/16 128 3 0
5/8/16 129 4 0
5/8/16 129 1 0
5/10/16 131 1 50
5/10/16 131 4 50
5/12/16 133 1 0
5/13/16 134 1 50
5/14/16 135 1 0
5/14/16 135 2 50
5/14/16 135 2 0
5/14/16 135 1 0
5/17/16 138 1 0
5/17/16 138 2 0
5/23/16 144 1 0
5/24/16 145 4 0
5/24/16 145 1 0
5/24/16 145 1 0
5/27/16 148 3 NA
5/27/16 148 1 0
5/27/16 148 1 50
Any help would be greatly appreciated. Thanks very much in advance!
Something like this should get you started.
library(dplyr)
df <- df %>% mutate(Date = as.Date(Date, format='%m/%d/%y'),
Start = as.Date(cut(Date, breaks= seq(min(Date), max(Date)+4, by = 4)))) %>%
mutate(End = Start+3) %>%
group_by(Start,End) %>%
summarise(meanColor = mean(Color, na.rm=T),
sdColor = sd(Color, na.rm=T))
df
#Source: local data frame [10 x 4]
#Groups: Start [?]
# Start End meanColor sdColor
# <date> <date> <dbl> <dbl>
#1 2016-04-19 2016-04-22 56.25000 35.93976
#2 2016-04-23 2016-04-26 12.50000 25.00000
#3 2016-04-27 2016-04-30 50.00000 NA
#4 2016-05-01 2016-05-04 50.00000 NA
#5 2016-05-05 2016-05-08 0.00000 0.00000
#6 2016-05-09 2016-05-12 33.33333 28.86751
#7 2016-05-13 2016-05-16 20.00000 27.38613
#8 2016-05-17 2016-05-20 0.00000 0.00000
#9 2016-05-21 2016-05-24 0.00000 0.00000
#10 2016-05-25 2016-05-28 25.00000 35.35534
Then plot using,
library(ggplot)
ggplot(df) + geom_line(aes(Start,meanColor))