Search code examples
rggplot2colorbarggpubrggpairs

Making a ggpaired plot where line.color is a weighted function?


I have some data from before and after a treatment was applied, and wanted to look at the paired data, so I turned to ggpaired. I am able to get this to work fine with my data. (I simulated some data that is similar to what I am working with so that others could mess around with it.)

set.seed(123)
size <- 28
a.d <- round(runif(size, 1, 4))
g.d <- round(runif(size, 1, 4))
s.d <- round(runif(size, 1, 4))
p.d <- round(runif(size, 1, 4))
a.f <- a.d + round(runif(1,-1,1))
g.f <- g.d + round(runif(1,-1,1))
s.f <- s.d + round(runif(1,-1,1))
p.f <- p.d + round(runif(1,-1,1))

df.t <- data.frame("A" = c(a.d,a.f),"G" = c(g.d,g.f),"S" = c(s.d,s.f),"P" = c(p.d,p.f),"V" = c(rep("D", size),rep("F", size)))

Then to plot (I have installed and loaded the packages ggpubr, gridExtra, and ggplot2):

p <- list()
for(i in colnames(df.t[,-5])){
    d <- head(df.t[i],nrow(df.t)/2)
    d <- d[,i]
    f <- tail(df.t[i],nrow(df.t)/2)
    f <- f[,i]
    fin <- data.frame(draft = d, final = f)

    p[[i]] <- ggpaired(fin, cond1 = "draft", cond2 = "final", fill = "condition", line.color = "gray", line.size = 0.4, palette = "jco", xlab = "Draft version", ylab = paste(colnames(df.t[i]),"rating"), title = paste("Paired box plot of",colnames(df.t[i]),"ratings"))
}

do.call(grid.arrange,p)

which produces the image:

enter image description here

This is fine, but I have a lot of values that go from, say, a value of 2 pre-treatment and then are a value of 1 post-treatment, and you can't really visualize it with the line color as is. While googling, I came across this post, which isn't what I need, I think. I don't know the best way to phrase this question and keep finding results for edge line width for networks.

Basically, what I would like to do is this: if I have 11 observations that go from 3 to 2, I would like the line from 3 to 2 to be darker than, say, the line from 1 to 0, which only has 3 observations, a little like in this very quick mock-up I did in Paint:

enter image description here

I hope that it would be possible to do something like this with line.color (or maybe with line.weight?), and making a function that colors the lines by the weight (or more specifically, the number of counts), but I'm not the best with R (and rather new to it), so any help would be appreciated, as I don't know how to begin doing something like this, and everything I google regarding this topic seems related to network graphs.


Solution

  • What you want is certainly possible, but using ggpaired may not be the most straightforward way to get there (disclaimer: I don't use the ggpubr package much.)

    ggpaired is essentially a wrapped around the underlying ggplot2 package's functions. If you want to make changes to how the things are done, making changes in the underlying functions is a clean way to go about it. (If you intend to use R in the future, getting down to the brass tacks is also a good way to learn.)

    Here's how I'd do it, starting from the original dataframe df.t:

    library(dplyr)
    
    df.t %>%
      mutate(pair.order = rep(seq(1, n()/2), times = 2)) %>% # add new column to keep track of pairs
      tidyr::pivot_longer(cols = A:P, names_to = "facet") %>% # convert data to long form so all 
                                                              # values are captured in one variable
      ggplot(aes(x = V, y = value, fill = V)) +
      geom_boxplot() +  
      geom_line(data = . %>%                             # further data manipulation for line layer
                  tidyr::pivot_wider(names_from = V) %>% # arrange values in pairs
                  count(facet, D, F) %>%                 # & aggregate them for each treatment
                  mutate(n = cut(n, breaks = c(0, 5, 10, Inf),
                                 labels = c("n \u2264 5", "6 < n \u2264 10", "n > 10"))) %>%
                  mutate(line.group = seq(1, n())) %>%   # add grouping identifier for line
                  tidyr::pivot_longer(cols = D:F, names_to = "V"),  # return to long form
                aes(group = line.group, 
                    colour = forcats::fct_rev(n)), # reverse category order for count
                size = 2) +                        # increase line size for easier comparison
      
      facet_wrap(~facet,     # split into 4 plot facets, one for each treatment
                 labeller = labeller(facet = function(x) paste("Paired boxplot of", x, "ratings"))) +
      scale_x_discrete(labels = c("draft", "final")) +
      labs(y = "rating", colour = "Number of\ncounts") +
      ggsci::scale_fill_jco(guide = FALSE) + # not showing legend since it's the same as x-axis
      scale_colour_grey() +
      theme_pubr() +
      theme(legend.position = "right",
            axis.title.x = element_blank())
    

    plot