Search code examples
rggplot2scatter-plotlattice

How to make a scatterplot rectangular matrix: (y1, y2, ...) ~ (x1, x2, ...) in R: ggplot2, lattice or base


I'm looking for a way to construct a rectangular matrix of scatterplots of each of several y variables against each of several xs, in a way similar to the base pairs() function, or GGally::ggpairs(), but only for those of y~x.

Ideally, this should support (a) coloring points by groups; (b) adding annotations such as linear regression lines.

Here is a quick example of the basic idea

> data(Rohwer, package = "heplots")
> str(Rohwer)
'data.frame':   69 obs. of  10 variables:
 $ group: int  1 1 1 1 1 1 1 1 1 1 ...
 $ SES  : Factor w/ 2 levels "Hi","Lo": 2 2 2 2 2 2 2 2 2 2 ...
 $ SAT  : int  49 47 11 9 69 35 6 8 49 8 ...
 $ PPVT : int  48 76 40 52 63 82 71 68 74 70 ...
 $ Raven: int  8 13 13 9 15 14 21 8 11 15 ...
 $ n    : int  1 5 0 0 2 2 0 0 0 3 ...
 $ s    : int  2 14 10 2 7 15 1 0 0 2 ...
 $ ns   : int  6 14 21 5 11 21 20 10 7 21 ...
 $ na   : int  12 30 16 17 26 34 23 19 16 26 ...
 $ ss   : int  16 27 16 8 17 25 18 14 13 25 ...

I'll plot 3 ys against 3 xs here:

yvars <- c("SAT", "PPVT", "Raven" )
xvars <- c("n", "s", "ns", "na", "ss")
xvars <- c("n", "s", "ns")  # smaller example
gp <- "SES"

op <- par(mfrow = c(length(yvars), length(xvars)),
          mar = c(4, 4, 1, 1)+.1)
for(y in yvars) {
  for (x in xvars) {
    plot(Rohwer[, x], Rohwer[, y],
         xlab=x, ylab=y)
    abline(lm(Rohwer[, y] ~ Rohwer[, x]))
  }
}
par(op)

enter image description here

But of course, I'd like to eliminate all y labels, except in column 1 and row labels except in the last row, as well as be able to add further enhancements to the contents of each plot: different point symbols/colors for groups, separate regression lines, ...

I've looked at base R pairs(), GGally::ggpairs() and lattice::splom() but don't see any way to do what I want.


Solution

  • One option to achieve your result using ggplot2 would be to use GGally::ggmatrix:

    library(heplots)
    
    data(Rohwer, package = "heplots")
    
    yvars <- c("SAT", "PPVT", "Raven" )
    xvars <- c("n", "s", "ns")
    gp <- "SES"
    
    grid <- expand.grid(x = xvars, y = yvars, stringsAsFactors = FALSE)
    
    library(ggplot2)
    library(GGally)
    
    plot_fun <- function(x, y) {
      ggplot(Rohwer, aes(.data[[x]], .data[[y]], color = .data[[gp]])) +
        geom_point() +
        geom_smooth(method = "lm", se = FALSE, formula = y ~ x)
    }
    
    p <- Map(plot_fun, grid$x, grid$y)
    
    ggmatrix(p, nrow = 3, ncol = 3, xAxisLabels = xvars, yAxisLabels = yvars)
    

    Or as a second option you could convert to long format and use facetting which IMHO makes it easier to customize your plo further:

    UPDATE To get the right order we could convert to factors using e.g. an additional mutate step.

    library(tidyr)
    library(dplyr)
    
    Rohwer_long <- Rohwer %>%
      pivot_longer(cols = all_of(xvars), names_to = "xvar", values_to = "x") %>%
      pivot_longer(cols = all_of(yvars), names_to = "yvar", values_to = "y") %>%
      mutate(xvar = factor(xvar, xvars), yvar = factor(yvar, yvars))
      
    ggplot(Rohwer_long, aes(x, y, color = SES)) +
      geom_point() +
      geom_smooth(method = "lm", se = FALSE, formula = y ~ x) +
      facet_grid(yvar ~ xvar, scales = "free")