I thought this would be easy ... but couldn't find a solution. I am trying to generate a ggplot2 in R with correlation between col1 and col2, and size of the dot with col3, and shape with col4. col3 and col4 has NA/missing values. When running the code below, ggplot2 removes the rows without a matching col3 and/or col4, however, I want to keep these and color code. Output below
Example dataframe:
Warning: Removed 3 rows containing missing values (geom_point).
geom_point(aes(size=df$col3, col=df$col4), na.rm=FALSE)
scale_size(range = c(0.25,4), na.value = 0) #to give a 0 value to the na.value (although would rather not)
But, I ended with "Ignoring unknown aesthetics: na.rm" for #2 and #3, and #1 gave an error. Also, that doesn't fix the issue that col4 shapes are being removed too
ggplot(df, aes(x=df$col1, y=df$col2)) +
geom_point(aes(size=df$col3, col=df$col4), na.rm=FALSE) +
theme_classic() +
scale_size(range = c(0.25,4))
+-------------+-------------+-------------+----------+
| col1 | col2 | col3 | col4 |
+-------------+-------------+-------------+----------+
| 0.254393811 | 0.124242905 | NA | NA |
| 0.28223149 | 0.148601748 | 0.236953099 | CD8CTL |
| 0.205945835 | 0.074541695 | NA | NA |
| 0.199758631 | 0.103369485 | NA | CD8Mem |
| 0.2798128 | 0.109511863 | 0.396113132 | CD8STAT1 |
| 0.254616042 | 0.059495241 | 0.479590212 | CD8CTL |
| 0.197929395 | 0.10993698 | 0.272611442 | CD8CTL |
| 0.294888359 | 0.12319682 | 0.16069263 | CD8CTL |
| 0.191407446 | 0.086443936 | 0.36596486 | CD8CTL |
| 0.267533392 | 0.11240525 | 0.344659516 | CD8CTL |
+-------------+-------------+-------------+----------+
There's a few things to note - I think I have understood what the OP is looking to do here. In this case, you want all points to plot. I'm going to state how we want the plot to look:
col1
is used to plot x axiscol2
is used to plot y axiscol3
is used to control the size of the pointcol4
is used to control the color of the pointWe have NA
values in col3
and col4
. So what to do with those? Well, for color, I'm going to have those labeled and include them in the legend color-coded and labeled as "NA". What about for size? Well, size=NA
doesn't make any sense, so I think the best thing to do for df$col3 == NA
is going to be to change the shape. Here's what I've done:
ggplot(df, aes(x=col1, y=col2, color=col4)) +
geom_point(aes(size=col3, shape='Not NA')) +
geom_point(data=subset(df, is.na(col3)), aes(shape='NA'), size=3) +
scale_shape_manual(values=c('NA'=3, 'Not NA'=19)) +
theme_classic()
First of all, it's bad form to reference columns via data.frame$column.name
- you should use just the column name itself.
Color is easy - we just put color=col4
in the top aes()
specification, since it's applied to every geom.
For the shape, it's probably easiest here to specify in two separate calls to geom_point()
. One is without any specification, which will naturally remove any NA
s - you won't get points plotted with size=NA
. To "add back in" the NA points, we have to specifically pull those out and specify a size. Finally, in order to get the shape aesthetic inside a legend, we need to put it inside the aes()
. The general rule here is that if you set an aesthetic equal to the column name inside aes()
, it will use the values inside that column for labelling. If you just type a character inside aes()
like we did here, you will have all items in that geom call labeled with that character - but the legend is created. So, we basically are creating our own custom legend for shape
here.
Then it's just a matter of using scale_shape_manual()
and a named vector for the values
argument to set the actual shape we want to use.
Thinking about this a bit more, it doesn't make sense for NA to appear in the legend for color and shape, so let's remove it from color. That's done by completely separating the dataset that includes NAs in col3
from the one that doesn't:
ggplot(df, aes(x=col1, y=col2, color=col4)) +
geom_point(data=subset(df, !is.na(col3)), aes(size=col3, shape='Not NA')) +
geom_point(data=subset(df, is.na(col3)), aes(shape='NA'), size=3) +
scale_shape_manual(values=c('NA'=3, 'Not NA'=19)) +
theme_classic()