Search code examples
rggplot2density-plot

Adding a part of data in a density plot by ggplot


I have a file having two different categories, and most of them are in one category. The categories are : in and out.

file1_ggplot.txt

status scores
in     44
in     55
out    12
out    23
out    99
out    13

To plot the density distribution, I am using this code, but I want to add a summary of categories and the lines with has in:

library(data.table)
library(ggplot2)
library(plyr)
filenames <- list.files("./scores",pattern="*ggplot.txt", full.names=TRUE)
pdf("plot.pdf")
for(file in filenames){
     library(tools)
     bases <- file_path_sans_ext(file)
     data1 <- fread(file)
     cdat <- ddply(data1, "status", summarise, scores.mean=mean(scores))
     data1ggplot <- ggplot(data1, aes(x=scores, colour=status)) + geom_density() + geom_vline(data=cdat, aes(xintercept=scores.mean, colour=status), linetype="dashed", size=1)
     print(data1ggplot + ggtitle(basename(bases)))

    }
dev.off()

Which outpus: ggplot for two categories

I want to add a box, which has the lines of in :

in     44
in     55

And,

> summary(data1$scores)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  12.00   15.50   33.50   41.00   52.25   99.00 

For this, I am trying to use the tableGrob:

data1ggplot <- ggplot(data1, aes(x=scores, colour=status)) + geom_density() + geom_vline(data=cdat, aes(xintercept=scores.mean, colour=status), linetype="dashed", size=1) +  annotation_custom(tableGrob(summary(data1$scores))

ggplot2.2

But it gives the same plot above which only has the numbers of summary.

Then, I have grepped the lines with in.

cat file1_ggplot.txt | grep -w "in" > only-in.txt

Then in R:

data2<-fread("only-in.txt")

trs <- as.data.frame(t(data2))
trs
       V1 V2
    V1 in in
    V2 44 55
data1ggplot <- ggplot(data1, aes(x=scores, colour=status)) + geom_density() + geom_vline(data=cdat, aes(xintercept=scores.mean, colour=status), linetype="dashed", size=1) +  annotation_custom(tableGrob(trs))

And it outputs in: ggplot2.3

What can I do to see these tables properly next to the plot, and for the lines with in without first using grep in bash?


Solution

  • Here is a solution, with hypothesis on the format of the table you want:

    enter image description here

    Individual plot

    library(tidyverse)
    library(gridExtra) # tableGrob
    library(broom) # glance
    
    df_summary <- t(broom::glance(summary(data1$scores)))
    data1 %>%
      ggplot(., aes(x = scores, colour = status)) + 
      geom_density() + 
      geom_vline(data = . %>% 
                   group_by(status) %>%
                   summarise(scores.mean = mean(scores)), 
                 aes(xintercept = scores.mean, colour = status), 
                 linetype = "dashed", 
                 size = 1) +
      annotation_custom(tableGrob(rbind(data.frame(data1 %>% filter(status == "in") %>% rename(var = status, val = scores)),
                                        data.frame(var = row.names(df_summary), val = df_summary, row.names = NULL)), 
                                        rows = NULL, cols = NULL),
                        xmin = 60, xmax = 100,
                        ymin = 0.1, ymax = 0.4)
    

    Applied to a list of data frames

    # Mock data
    set.seed(1)
    data_list = list(data1, 
                     data.frame(status = data1$status, scores = c(40, 60, 15, 21, 97, 10)),
                     data.frame(status = data1$status, scores = c(45, 56, 11, 25, 95, 14)))
    
    # Create a function 
    
    your_function <- function(df) {
      df_summary <- t(broom::glance(summary(df$scores)))
      df %>%
      ggplot(., aes(x = scores, colour = status)) + 
      geom_density() + 
      geom_vline(data = . %>% 
                   group_by(status) %>%
                   summarise(scores.mean = mean(scores)), 
                 aes(xintercept = scores.mean, colour = status), 
                 linetype = "dashed", 
                 size = 1) +
      annotation_custom(tableGrob(rbind(data.frame(df %>% filter(status == "in") %>% rename(var = status, val = scores)),
                                        data.frame(var = row.names(df_summary), val = df_summary, row.names = NULL)), rows = NULL, cols = NULL),
                        xmin = 60, xmax = 100,
                        ymin = 0.1, ymax = 0.4)
    
    }
    
    # Check if it works 
    your_function(data_list[[2]])
    your_function(data_list[[3]])
    

    enter image description here enter image description here

    # Map it
    pdf("plot.pdf")
    map(data_list, your_function)
    dev.off()
    

    You should now have a "plot.pdf" file with 3 pages with each plot.

    Note that you should adapt the position of tableGrob according to your date, I don't know where to put the table, you can also compute the position according to summary values.