Search code examples
rdplyrpivottidyversereshape2

Pivoting to a longer format using pivot_longer


I'm trying to pivot to a longer format using dplyr::pivot_longer, but can't seem to get it to do what I want. I can manage with reshape::melt, but I'd also like to be able to achieve the same using pivot_longer.

The data I'm trying to reformat is a correlation matrix of the mtcars-dataset:

# Load packages
library(reshape2)
library(dplyr)

# Get the correlation matrix
mydata <- mtcars[, c(1,3,4,5,6,7)]
cormat <- round(cor(mydata),2)

head(cormat)
       mpg  disp    hp  drat    wt  qsec
mpg   1.00 -0.85 -0.78  0.68 -0.87  0.42
disp -0.85  1.00  0.79 -0.71  0.89 -0.43
hp   -0.78  0.79  1.00 -0.45  0.66 -0.71
drat  0.68 -0.71 -0.45  1.00 -0.71  0.09
wt   -0.87  0.89  0.66 -0.71  1.00 -0.17
qsec  0.42 -0.43 -0.71  0.09 -0.17  1.00

Then, I want only to filter out the upper triangle of the matrix;

#Get upper triangle of the correlation matrix
cormat[upper.tri(cormat)] <- NA #OR upper.tri function

And then reshape it into a long format:

# Reshape into a long format
melted_cormat <- 
  cormat %>% 
  melt(na.rm=TRUE)

head(melted_cormat)
   Var1 Var2 value value_2
1   mpg  mpg  1.00       1
7   mpg disp -0.85   -0.85
8  disp disp  1.00       1
13  mpg   hp -0.78   -0.78
14 disp   hp  0.79    0.79
15   hp   hp  1.00       1

Finally, what the figure that I'm making is:

ggplot(data = melted_cormat, aes(Var2, Var1, fill = value))+
  geom_tile(color="white") +
  scale_fill_gradient2(low = "blue", high = "red", mid = "white", 
                       midpoint = 0, limit = c(-1,1), 
                       #space = "Lab", 
                       name="Spearman\nCorrelation") +
  theme_minimal()+ 
  coord_fixed() +
  geom_text(aes(Var2, Var1, label = value), color = "black", size = 4) +
  theme(
    axis.text.x=element_text(family="Calibri", face="plain", color="black", size=12, angle=0), 
    axis.title.x=element_blank(),
    axis.title.y=element_blank(),
    panel.grid.major=element_blank(),
    panel.border=element_blank(),
    panel.background=element_blank(),
    axis.ticks = element_blank(),
    legend.justification = c(1, 0),
    legend.position = c(0.9, 0.3),
    legend.direction = "horizontal")+
  guides(fill = guide_colorbar(barwidth = 7, barheight = 1,
                               title.position = "top", title.hjust = 0.5))

enter image description here

I cant seem to figure out a way to use pivot_longer in stead of reshape so that it still correctly makes the figure. The following almost works (thanks to @geoff), the dataset seems to be correct but the figure does not:

melted_cormat <- 
  cormat %>% 
  as_tibble() %>% 
  mutate(Var1 = colnames(cormat)) %>% 
  pivot_longer(names_to = "Var2", values_to = "value", mpg:qsec, values_drop_na=TRUE)

running the same ggplot code as above gives:

enter image description here


Solution

  • Compare str(melted_cormat) with str(pivoted_cormat). You'll find that the older reshape2::melt() converts strings to factors whereas tidyr::pivot_longer() leaves them strings.

    The consequence of this is that with the melted version, ggplot() will order the rows and columns according to the factor levels and thus preserve the original order in cormat, but in the second case where they're just plain strings, they just go in alphabetical order.

    To fix this, simply mutate() Var1 and Var2 into factor using as levels the original order of the columns in cormat. This will give you the plot you wanted.

    Observe the difference in the last two lines of the example below and also note that the default method of cor is "pearson" so just be careful when you label your legend with the correlation method.

    # Load packages
    library(tidyverse)
    library(reshape2)
    
    # define plotting function
    plot_fun <- function(dat) {
      ggplot(data = dat, aes(Var2, Var1, fill = value)) +
        geom_tile(color = "white") +
        scale_fill_gradient2(
          low = "blue",
          high = "red",
          mid = "white",
          midpoint = 0,
          limit = c(-1, 1),
          #space = "Lab",
          name = "Spearman\nCorrelation"
        ) +
        theme_minimal() +
        coord_fixed() +
        geom_text(aes(Var2, Var1, label = value),
                  color = "black",
                  size = 4) +
        theme(
          axis.text.x = element_text(
            family = "Calibri",
            face = "plain",
            color = "black",
            size = 12,
            angle = 0
          ),
          axis.title.x = element_blank(),
          axis.title.y = element_blank(),
          panel.grid.major = element_blank(),
          panel.border = element_blank(),
          panel.background = element_blank(),
          axis.ticks = element_blank(),
          legend.justification = c(1, 0),
          legend.position = c(0.9, 0.3),
          legend.direction = "horizontal"
        ) +
        guides(fill = guide_colorbar(
          barwidth = 7,
          barheight = 1,
          title.position = "top",
          title.hjust = 0.5
        ))
    }
    
    # Get the correlation matrix
    cormat <- mtcars[, c(1, 3, 4, 5, 6, 7)] %>%
      cor(., method = "spearman") %>% # note selection of correlation method
      round(2) %>%
      replace(upper.tri(.), NA)
    
    # make melted version
    melted <- cormat %>%
      melt(na.rm = TRUE)
    
    # make pivoted version
    pivoted <-
      cormat %>%
      as.data.frame() %>%
      rownames_to_column("Var1") %>%
      pivot_longer(
        -Var1,
        names_to = "Var2",
        values_to = "value",
        values_drop_na = TRUE
      )
    
    # note column types on melted vs pivoted
    str(melted)
    #> 'data.frame':    21 obs. of  3 variables:
    #>  $ Var1 : Factor w/ 6 levels "mpg","disp","hp",..: 1 2 3 4 5 6 2 3 4 5 ...
    #>  $ Var2 : Factor w/ 6 levels "mpg","disp","hp",..: 1 1 1 1 1 1 2 2 2 2 ...
    #>  $ value: num  1 -0.91 -0.89 0.65 -0.89 0.47 1 0.85 -0.68 0.9 ...
    str(pivoted)
    #> tibble [21 x 3] (S3: tbl_df/tbl/data.frame)
    #>  $ Var1 : chr [1:21] "mpg" "disp" "disp" "hp" ...
    #>  $ Var2 : chr [1:21] "mpg" "mpg" "disp" "mpg" ...
    #>  $ value: num [1:21] 1 -0.91 1 -0.89 0.85 1 0.65 -0.68 -0.52 1 ...
    
    # melted version gives desired plot
    melted %>% 
      plot_fun()
    

    # pivoted version orders variables in alphabetical order
    pivoted %>% 
      plot_fun()
    

    # turning the variable names into a factor fixes the plot
    pivoted %>% 
      mutate(across(starts_with("Var"), ~factor(.x, levels = colnames(cormat)))) %>%
      plot_fun()
    

    Created on 2022-01-12 by the reprex package (v2.0.1)