Search code examples
rindexingplotggplot2density-plot

Plot multiple plots corresponding to multiple columns, specified by index, on 1 graph/axes using ggplot2


Here is the data frame:

> test
             a           b          c
1   0.22904349 -0.12023869  0.1546898
2   1.09504754 -0.20398923 -0.9313251
3  -0.41200391 -0.16308791  0.6716105
4  -0.04356308 -1.81898245 -0.8074506
5  -1.23413459  1.24309479 -1.3861049
6   0.14266136 -2.22712577 -0.2341793
7  -0.25113445  0.60213281 -0.8106908
8   2.52372557  0.03794341 -1.4308955
9   0.66005867  0.74508029 -0.2922560
10  1.23552452 -0.26187445 -0.9874546

What I want to plot are the densities of a,b and c on a single graph. I want to be able to specify the columns to be plotted by their indices. Additionally, the densities can be colored according to their columns. This is the code I tried:

test<- as.data.frame(cbind(a=rnorm(1:10),b=rnorm(1:10),c=rnorm(1:10)))
for(i in seq(1,ncol(test),1)){
  if(i==1)p<-ggplot(data=test, aes_string(x=names(test)[i]))
  else p<-p+ggplot(data=test, aes_string(x=names(test)[i]))
}
p+geom_density() 

Error I got:

Error in p + o : non-numeric argument to binary operator
In addition: Warning message:
Incompatible methods ("+.gg", "Ops.data.frame") for "+" 

Please advice. Thanks


Solution

  • The standard ggplot way is to use long data, not wide data:

    library(tidyr)
    test_long = gather(test)
    
    ggplot(test_long, aes(x = value, color = key)) +
        geom_density()
    

    If you really want indices in there, we'll add them to the long data:

    test_long$index = match(test_long$key, names(test))
    

    And then the way to select which ones to plot is to subset the data passed to ggplot

    # if you only want columns 2 and 3 from the original data
    ggplot(test_long[test_long$index %in% c(2, 3), ],
           aes(x = value, color = key)) +
        geom_density()
    

    And, if you really want to be stubborn, the problem with your for loop is that ggplot is called multiple times. ggplot() initializes a plot, you can't add it to a plot multiple times. You can fix it, but you shouldn't do things this way.

    p = ggplot(data = test)
    
    for(i in seq_along(test)) {
      if (i == 1) p = p + geom_density(aes_string(x = names(test)[i]))
      else p = p + geom_density(aes_string(x = names(test)[i]), color = "green")
    }
    
    print(p)
    

    In this case ggplot isn't being used as intended so you'd have to set up your own colors and adding a legend will be a real pain. Which is part of why you should do it the other way, the easy way.


    Edits: In a fresh R session, this runs just fine for me:

    # load packages
    library(tidyr)
    library(ggplot2)
    
    # data from the question
    test <- as.data.frame(cbind(a=rnorm(1:10),b=rnorm(1:10),c=rnorm(1:10)))
    
    # long format
    test_long = gather(test)
    
    # plot all 3
    ggplot(test_long, aes(x = value, color = key)) +
        geom_density()
    
    # add original data indices
    test_long$index = match(test_long$key, names(test))
    
    # plot only columns 2 and 3
    ggplot(test_long[test_long$index %in% c(2, 3), ],
           aes(x = value, color = key)) +
        geom_density()