I am plotting some data using facet_grid()
, and I noticed something puzzling.
I anticipate I am a beginner with ggplot libraries and I might have missed something. Anyhow, here it goes.
Assuming the following dataframe:
library(ggplot2)
d1 <- runif(500)
d2 <- runif(500)*10
s1 <- sample(LETTERS[1:2], 500, replace = T, prob=c(0.3, 0.7))
s2 <- sample(letters[3:4], 500, replace = T, prob=c(0.4, 0.6))
df <- data.frame(s1, s2, d1, d2)
which looks like this:
s2 s1 d1 d2
c B 0.3434944 0.9881925
d A 0.7847741 9.7759946
d A 0.3142764 2.3654268
...
I plot the data so that they are sorted according to the categorical values:
ggplot(df, aes(x=df$d1, y=df$d2)) +
geom_point(col="red", cex=2) +
facet_grid(d2 ~ d1)
Resulting in the following plot:
I now want to overplot only a subset of the data, and I used the following (here simplified) code:
geom_point(data=df[df$d2 > 7.5,],
aes(x=df$d1[df$d2 > 7.5], y=df$d2[df$d2 > 7.5]),
cex=1, colour=I("black"))
Resulting into the following plot:
Now, having set a threshold, I expect that all values, say, "bigger than threshold" were plotted onto pre-existing values.
This does not appear to be the case.
In fact, some pre-existing values do not have the matching thresholded value. Also, some thresholded values do not have the matching pre-existing value. What puzzles me most is that, it is my understanding, that the data points come from the same dataframe, and I expect the first layer (the pre-existing ones) to contain the second layer. Am I missing something here?
Also, if looking carefully, the plotted points are matching the right 2D-position, although they are in the wrong quadrant.
Even more puzzling: if I plot the following subsets:
ggplot(df[df$d2 < 7.5,], aes(x=df$d1[df$d2 < 7.5], y=df$d2[df$d2 < 7.5])) +
geom_point(col="red", cex=2) +
facet_grid(d2 ~ d1) +
geom_point(data=df[df$d2 > 7.5,], aes(x=df$d1[df$d2 > 7.5], y=df$d2[df$d2 > 7.5]), cex=1, colour=I("black"))
Some of the pre-existing values move from the region "above threshold" to that "below threshold". Can anybody explain such behaviour?
Thanks a lot.
I can't exactly explain the why of your problem, but I think your subsets
within the plot function were not recognising the facets. By creating a new T/F
column in the dataframe
, we can control the colours and size for each individual facet
. Is this any good?
EDIT Using hollow points, shape=21
and scale_fill_manual
, to exactly address the question.
df$d<-df$d2>7.5
ggplot(data=df, aes(x=d1, y=d2,colour=d,size=d,fill=d))+
facet_grid(s1~s2)+
geom_point(show.legend=F,shape=21,size=2,stroke=1.5,col="red")+
scale_fill_manual(values=setNames(c('black','red'),c(T,F)))