Search code examples
rggplot2ggallyparallel-coordinates

Individual y-axis for each level in parallel coordinates plot


I'm trying to generate a parallel coordinates plot, where each variable has its own axis. For instance:

Target plot

So far, I've used the function ggparcoord() from the package GGally. However, as far as I can tell, it does not allow each variable to have its own axis as above.

Does anyone know how this may be done using R, preferably using ggplot2? Thanks in advance.


Solution

  • I'm not aware of any packages that can do this, but it's not too difficult to draw the axes yourself in ggplot.

    Let's say we have a similar dataset to the one shown in your example plot:

    library(ggplot2)
    library(dplyr)
    
    cars <- mtcars %>% 
              select(c(2:4, 6:7, 1)) %>%
              tibble::rownames_to_column("model") %>%
      as_tibble()
    
    cars
    #> # A tibble: 32 x 7
    #>    model               cyl  disp    hp    wt  qsec   mpg
    #>    <chr>             <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
    #>  1 Mazda RX4             6  160    110  2.62  16.5  21  
    #>  2 Mazda RX4 Wag         6  160    110  2.88  17.0  21  
    #>  3 Datsun 710            4  108     93  2.32  18.6  22.8
    #>  4 Hornet 4 Drive        6  258    110  3.22  19.4  21.4
    #>  5 Hornet Sportabout     8  360    175  3.44  17.0  18.7
    #>  6 Valiant               6  225    105  3.46  20.2  18.1
    #>  7 Duster 360            8  360    245  3.57  15.8  14.3
    #>  8 Merc 240D             4  147.    62  3.19  20    24.4
    #>  9 Merc 230              4  141.    95  3.15  22.9  22.8
    #> 10 Merc 280              6  168.   123  3.44  18.3  19.2
    #> # ... with 22 more rows
    

    We can calculate the axis breaks (and set the coordinates of the tick marks) using some simple arithmetic:

    axis_df <- stack(cars[-1]) %>% 
                 group_by(ind) %>% 
                 summarize(breaks = pretty(values, n = 10),
                           yval = (breaks - min(breaks))/(max(values) - min(values))) %>%
                 mutate(xmin = as.numeric(ind) - 0.05, 
                        xmax = as.numeric(ind),
                        x_text = as.numeric(ind) - 0.2)
    

    And the co-ordinates for our actual axis lines like this:

    axis_line_df <- axis_df %>% 
                      group_by(ind) %>%
                      summarize(min = min(yval), max = max(yval))
    

    Now we need to reshape and normalize the original data:

    lines_df <- cars[-1] %>%
       mutate(across(everything(), function(x) (x - min(x))/(max(x) - min(x)))) %>%
       stack() %>%
       mutate(row = rep(cars$model, ncol(cars) - 1))
    

    Finally, the plotting code would be something like:

    ggplot(lines_df, aes(ind, values, group = row)) + 
      geom_line(color = "orange", alpha = 0.5) +
      geom_segment(data = axis_line_df, aes(x = ind, xend = ind, y = min, yend = max),
                   inherit.aes = FALSE) +
      geom_segment(data = axis_df, aes(x = xmin, xend = xmax, y = yval, yend = yval),
                   inherit.aes = FALSE) +
      geom_text(data = axis_df, aes(x = x_text, y = yval, label = breaks),
                inherit.aes = FALSE) +
      geom_text(data = axis_line_df, aes(x = ind, y = 1.2, label = ind),
                size = 6, inherit.aes = FALSE, check_overlap = TRUE, hjust = 1) +
      theme_void() +
      theme(plot.margin = margin(50, 20, 50, 20))
    

    Created on 2021-10-24 by the reprex package (v2.0.0)