Search code examples
rdataframeggplot2plotdensity-plot

Plotting two overlapping density curves using ggplot


I have a dataframe in R consisting of 104 columns, appearing as so:

   id         vcr1       vcr2         vcr3  sim_vcr1  sim_vcr2  sim_vcr3  sim_vcr4  sim_vcr5  sim_vcr6  sim_vcr7
1 2913 -4.782992840  1.7631999  0.003768704  1.376937 -2.096857  6.903021  7.018855  6.135139  3.188382  6.905323
2 1260  0.003768704  3.1577108 -0.758378208  1.376937 -2.096857  6.903021  7.018855  6.135139  3.188382  6.905323
3 2912 -4.782992840  1.7631999  0.003768704  1.376937 -2.096857  6.903021  7.018855  6.135139  3.188382  6.905323
4 2914 -1.311132669  0.8220594  2.372950077 -4.194246 -1.460474 -9.101704 -6.663676 -5.364724 -2.717272 -3.682574
5 2915 -1.311132669  0.8220594  2.372950077 -4.194246 -1.460474 -9.101704 -6.663676 -5.364724 -2.717272 -3.682574
6 1261  2.372950077 -0.7022792 -4.951318264 -4.194246 -1.460474 -9.101704 -6.663676 -5.364724 -2.717272 -3.682574

The "sim_vcr*" variables go all the way through sim_vcr100

I need two overlapping density density curves contained within one plot, looking something like this (except here you see 5 instead of 2):

enter image description here

I need one of the density curves to consist of all values contained in columns vcr1, vcr2, and vcr3, and I need another density curve containing all values in all of the sim_vcr* columns (so 100 columns, sim_vcr1-sim_vcr100)

Because the two curves overlap, they need to be transparent, like in the attached image. I know that there is a pretty straightforward way to do this using the ggplot command, but I am having trouble with the syntax, as well as getting my data frame oriented correctly so that each histogram pulls from the proper columns.

Any help is much appreciated.


Solution

  • With df being the data you mentioned in your post, you can try this:

    Separate dataframes with next code, then plot:

    library(tidyverse)
    library(gdata)
    #Index
    i1 <- which(startsWith(names(df),pattern = 'vcr'))
    i2 <- which(startsWith(names(df),pattern = 'sim'))
    #Isolate
    df1 <- df[,c(1,i1)]
    df2 <- df[,c(1,i2)]
    #Melt
    M1 <- pivot_longer(df1,cols = names(df1)[-1])
    M2 <- pivot_longer(df2,cols = names(df2)[-1])
    #Plot 1
    ggplot(M1) + geom_density(aes(x=value,fill=name), alpha=.5)
    #Plot 2
    ggplot(M2) + geom_density(aes(x=value,fill=name), alpha=.5)
    

    enter image description here

    enter image description here

    Update

    Use next code for one plot:

    #Unique plot
    #Melt
    M <- pivot_longer(df,cols = names(df)[-1])
    #Mutate
    M$var <- ifelse(startsWith(M$name,'vcr',),'vcr','sim_vcr')
    #Plot 3
    ggplot(M) + geom_density(aes(x=value,fill=var), alpha=.5)
    

    enter image description here