I am trying to plot a vector, y
which has 604800 points, against a sequence:
x=seq(from=1, to=604800)
. This is not a problem, but I do need to add a loess curve to the plots.
I have tried this using ggplot2
but this takes forever, and is notoriously bad at plotting large datasets. See R code:
vf <- ggplot(single.prop, aes(x,y)) + geom_line(linetype=1, size=1)
vf <- vf + stat_smooth(method="loess",fullrange=TRUE,aes(outfit=fit1<<-..y..))
vf
I have now tried to use the base
package, but this is also taking forever:
lw <- loess(y ~ x,data=single.prop)
plot(y ~ x, data=single.prop,pch=19,cex=0.1)
k <- order(single.prop$x)
lines(single.prop$x[k],lw$fitted[k],col="red",lwd=3)
Does anyone else have any suggestions about what I can do to make this run quicker? I have to do this multiple times, and have so far been waiting about 15 minutes for one plot, and is still not completed.
With this many data points it can indeed last a long time for the plot to render. Of course it depends on the data but often a plot with this many points does not give a very interpretable picture. For both time an interpretability it can be useful to calculate summary stats first and then plot. In your situation I can imagine binning on x and calculating one or multiple stats for y for every bin can be useful. I did a small example with the mean, but you can use the stat of your liking of course. Hope this helps..
x <- 1:10^6
y <- x/10^5 + rnorm(10^6)
plot_dat <- data.frame(x, y)
p <- ggplot(plot_dat, aes(x,y)) + geom_point()
bin_plot_dat <- function(bin_size){
nr_bins <- nrow(plot_dat) / bin_size
x2 <- rep(1:nr_bins * bin_size, each = bin_size)
y2 <- tapply(plot_dat$y, x2, mean)
data.frame(x = unique(x2), y= y2)
}
plot_dat2 <- bin_plot_dat(50)
p2 <- ggplot(plot_dat2, aes(x,y)) +
geom_point()
p2 + geom_smooth()