Search code examples
rplotfacet-grid

"facet_grid" and overplot: puzzling behaviour


I am plotting some data using facet_grid(), and I noticed something puzzling.

I anticipate I am a beginner with ggplot libraries and I might have missed something. Anyhow, here it goes.

Assuming the following dataframe:

library(ggplot2)

d1 <- runif(500)
d2 <- runif(500)*10
s1 <- sample(LETTERS[1:2], 500, replace = T, prob=c(0.3, 0.7))
s2 <- sample(letters[3:4], 500, replace = T, prob=c(0.4, 0.6))
df <- data.frame(s1, s2, d1, d2)

which looks like this:

s2 s1 d1        d2
c  B  0.3434944 0.9881925
d  A  0.7847741 9.7759946
d  A  0.3142764 2.3654268
...

I plot the data so that they are sorted according to the categorical values:

ggplot(df, aes(x=df$d1, y=df$d2)) +
geom_point(col="red", cex=2) +
facet_grid(d2 ~ d1)

Resulting in the following plot:

Plot 1

I now want to overplot only a subset of the data, and I used the following (here simplified) code:

geom_point(data=df[df$d2 > 7.5,],
aes(x=df$d1[df$d2 > 7.5], y=df$d2[df$d2 > 7.5]),
cex=1, colour=I("black"))

Resulting into the following plot:

Plot 2

Now, having set a threshold, I expect that all values, say, "bigger than threshold" were plotted onto pre-existing values.

This does not appear to be the case.

In fact, some pre-existing values do not have the matching thresholded value. Also, some thresholded values do not have the matching pre-existing value. What puzzles me most is that, it is my understanding, that the data points come from the same dataframe, and I expect the first layer (the pre-existing ones) to contain the second layer. Am I missing something here?

Also, if looking carefully, the plotted points are matching the right 2D-position, although they are in the wrong quadrant.

Even more puzzling: if I plot the following subsets:

ggplot(df[df$d2 < 7.5,], aes(x=df$d1[df$d2 < 7.5], y=df$d2[df$d2 < 7.5])) +
geom_point(col="red", cex=2) +
facet_grid(d2 ~ d1) +
geom_point(data=df[df$d2 > 7.5,], aes(x=df$d1[df$d2 > 7.5], y=df$d2[df$d2 > 7.5]), cex=1, colour=I("black"))

Some of the pre-existing values move from the region "above threshold" to that "below threshold". Can anybody explain such behaviour?

Thanks a lot.


Solution

  • I can't exactly explain the why of your problem, but I think your subsets within the plot function were not recognising the facets. By creating a new T/F column in the dataframe, we can control the colours and size for each individual facet. Is this any good?

    EDIT Using hollow points, shape=21 and scale_fill_manual, to exactly address the question.

    df$d<-df$d2>7.5
    
    ggplot(data=df, aes(x=d1, y=d2,colour=d,size=d,fill=d))+
        facet_grid(s1~s2)+
        geom_point(show.legend=F,shape=21,size=2,stroke=1.5,col="red")+
        scale_fill_manual(values=setNames(c('black','red'),c(T,F)))
    

    enter image description here