I have a model which has told me there is an interaction between two variables: a and b, which is significantly influencing my response variable: c. All three are continuous numeric variables. For detail c is the rate in change my response variable, b is the rate of change in my predictor and a is mean annual rainfall. The unit of analysis is pixels in a raster. So my model is telling me mean annual rainfall modifies how my predictor affects my response.
To visualise this interaction I would like to use a contour plot/heat map/level plot with a and b on the x and y axes and c providing the colour to show me how my response variable changes within the space described by a and b. I can do this with a scatter plot but its not very pretty or easy to interpret:
qplot(b, a, colour = c) +
scale_colour_gradient(low="green", high="red") +
When I try to plot a contour plot/heat map/level plot though all I get is errors, blank plots or ugly plots.
geom_contour
gives me an error:
ggplot(data = Mod, aes(x = Rain, y = Bomas, z = Fire)) +
geom_contour()
Warning message:
Not possible to generate contour data
geom_raster
initially gives me Error: cannot allocate vector of size 81567.2 Gb
but when I round my data it produces:
ggplot(data = df, aes(x = a, y = b, z = c)) +
geom_raster(aes(fill = c))
Adding interpolate = TRUE
to the geom_raster
code just makes the lines a little blurry.
geom_tile
produces a blank graph but with a scale bar for c:
ggplot(data = df, aes(x = a, y = b, z = c)) +
geom_tile(aes(color = c))
I've also tried using stat_density2d
and setting the fill and/or the colour to c, but just got an error, and I've tried using levelplot
in the lattice
package as well but that produces this:
levelplot(c ~ a * b, data = df,
aspect = "asp", contour = TRUE,
xlab = "a",
ylab = "b")
I suspect the problems I'm encountering are because the functions are not set up to deal with continuous x and y variables, all the examples seem to use factors. I would have thought I could compensate for that by changing bin widths but that doesn't seem to work either. Is there a function that allows you to make a heat map with 3 continuous variables? Or do I need to treat my a and b variables as factors and manually make a dataframe with bins appropriate for my data?
If you want to experiment for yourself then you get similar problems to what I'm having with:
df<- as.data.frame(rnorm(1:1068))
df[,2] <- rnorm(1:1068)
df[,3] <- rnorm(1:1068)
names(df) <- c("a", "b", "c")
You can get automatic bins, and for example calculate the means by using stat_summary_2d
:
ggplot(df, aes(a, b, z = c)) +
stat_summary_2d() +
geom_point(shape = 1, col = 'white') +
viridis::scale_fill_viridis()
Another good option is to slice your data by the third variable, and plot small multiples. This doesn't really show very well for random data though:
library(ggplot2)
ggplot(df, aes(a, b)) +
geom_point() +
facet_wrap(~cut_number(c, 4))