I'm hoping to create a plot that shows a running average over a scatterplot of the observed data. The data consists of observations of hares' coat color (Color) over time (Julian).
Color Julian
50 85
50 87
50 89
50 90
100 91
50 91
50 92
50 92
100 92
50 93
100 93
50 93
50 95
100 95
50 95
50 96
50 96
50 99
50 100
0 101
0 101
0 103
50 103
50 104
50 104
50 104
50 104
100 104
100 104
50 109
50 109
100 109
0 110
0 110
50 110
50 110
50 110
50 110
0 112
A friend wrote a function for me that calculates a running average of the color observations, but I can't figure out how to add the line (haresAveNoNa) into the plot.
The function:
haresAverage <- matrix( NA, max(hares$Julian), 3 )
for( i in 4:max(hares$Julian) ){
haresAverage[i,1]<-i
haresAverage[i,2]<-mean( hares$Color[ hares$Julian >= (i-3) &
hares$Julian <= (i+3)]
, na.rm=T )
haresAverage[i,3]<-sd( hares$Color[ hares$Julian >= (i-3) &
hares$Julian <= (i+3)]
, na.rm=T )
}
haresAveNoNa <- na.omit( haresAverage)
The plot:
p <- ggplot(hares, aes(Julian, Color))
p +
geom_jitter(width = 1, height = 5, color="blue", alpha = .65)
Can you please help me add the running average 'haresAveNoNa' into the plot? Thanks very much!
You can calculate the rolling mean using rollmean
from the zoo
package instead of writing your own function. You can invoke rollmean
on the fly, within ggplot, to add the rolling mean line, or you can add the rolling mean values to your data frame and then plot them. I provide examples below for both methods. The code below calculates a centered rolling mean with a seven-day window, but you can customize the function for different window sizes and for a left- or right-aligned rolling mean, rather than centered.
ggplot
library(zoo)
ggplot(hares, aes(Julian, Color)) +
geom_point(position=position_jitter(1,3), pch=21, fill="#FF0000AA") +
geom_line(aes(y=rollmean(Color, 7, na.pad=TRUE))) +
theme_bw()
To answer your specific question, let's say you actually do need to add the rolling mean line from separate data, rather than calculate it on the fly. If the rolling mean is another column in your data frame, you just need to give the new column name to geom_line
:
hares$roll7 = rollmean(hares$Color, 7, na.pad=TRUE)
ggplot(hares, aes(Julian, Color)) +
geom_point(position=position_jitter(1,3), pch=21, fill="#FF0000AA") +
geom_line(aes(y=roll7)) +
theme_bw()
If the rolling mean is in a separate data frame, you need to feed that data frame to geom_line
:
haresAverage = data.frame(Julian=hares$Julian,
Color=rollmean(hares$Color, 7, na.pad=TRUE))
ggplot(hares, aes(Julian, Color)) +
geom_point(position=position_jitter(1,3), pch=21, fill="#FF0000AA") +
geom_line(data=haresAverage, aes(Julian, Color)) +
theme_bw()
Julian
valueFirst, convert Julian
to Date format. I don't know the actual mapping from Julian
to date in your data, so for this example let's assume that Julian
is the day of the year, counting the first day of the year as 1, and let's assume the year is 2015.
hares$Date = as.Date(hares$Julian + as.numeric(as.Date("2015-01-01")) - 1)
Now we plot using our new Date
column for the x-axis. To customize both the number of breaks and the date labels, use scale_x_date
.
ggplot(hares, aes(Date, Color)) +
geom_point(position=position_jitter(1,3), pch=21, fill="#FF0000AA") +
geom_line(aes(y=rollmean(Color, 7, na.pad=TRUE))) +
theme_bw() +
scale_x_date(date_breaks="weeks", date_labels="%b %e")