Search code examples
rggplot2boxplotmelt

Long melt output, select for boxplot


I have a large melt output - 4608940, 2, comprising 1000 columns with ca. 4000+ rows. The variable column entries do not have the same number of points.

Is there a way to select certain data within the melt to use with ggplot2/boxplot()? Say column 50, column 130, col 650?

Easily done using r's base boxplot() and the original data.


Solution

  • # Get some data (1000 columns, 4000 rows)
    df<-data.table(sapply(seq(1,1000), function(x) rnorm(4000)))
    
    # Melt the data (result is 4,000,000 x 2)
    plot_input = melt(df, id.vars =NULL, measure.vars=colnames(df), variable.name = "col_num", value.name = "value")
    
    # boxplots of selected columns
    ggplot(
        plot_input[col_num %in% c("V50", "V130", "V650")],
        aes(y=value, x=col_num, color=col_num)) + 
    geom_boxplot() + 
    theme(legend.position="none") + labs(x="Column", y="Value")
    

    boxplots of selected columns from melt