Search code examples
rggplot2histogrampercentile

Plotting only quantiles in a ggarrangeplot


I have a plot where I am comparing several (around 12) unrelated descriptors. To facilitate the display of all these plots, I make a list:

library(facetscales)
library(ggplot2)

comb <- lapply(colnames(iris[1:4]), function(x) ggplot(iris, aes(x = get(x))) + 
                 geom_histogram(position = "identity", aes(y= ..ncount.., fill = Species), bins = 10) +
                 theme_classic() + 
                 facet_grid(Species~., scales ="free_y") +
                 theme(legend.position = 'None',

                       panel.spacing = unit(2, "lines"),
                       legend.title = element_blank(),
                       strip.background = element_blank(),
                       strip.text.y = element_blank(),
                       plot.margin = unit(c(10,10,10,10), "points")
                 )+
                 xlab(x) +
                 scale_x_continuous() 
)

which I use with the ggarrange function

ggarrange(plotlist = comb, common.legend = TRUE, legend = "bottom", ncol = 2, nrow = 2) 

to create a plot which suits my needs:

ggarranges plots

However, some of my data have some extreme outliers. I am therefore in need of creating plots which displays 90% quantile data of each column in my dataframe.

I would like to implement a solution which would be similar to the one presented by Warner in this question: (show only 0-90% or 0-95% percentile) , but I am unable to properly implement this solution with what I have. What I am looking for is a way to apply the information obtained from the line:

quantiles <- lapply(iris, quantile, c(0, 0.9)) # find 90% quantiles for all columns

so that only the 90th percentile data is displayed in the lapply function above.


Solution

  • I think you want to remove data above the 90th percentile and plot what remains. Here's some code to do this. I moved the code to a separate function to make it easier to debug and a made the quantile value a parameter to make it easy to change. I also used aes_string in the ggplot call instead of needing to use get.

    library(facetscales)
    library(ggplot2)
    library(ggpubr)
    
    myplot <- function(x, q) {
        data <- iris %>% dplyr::select(x)   # Select the column of interest
        quantiles <- quantile(data[,1], q)  # Calculate the required quantile
        filtered_data <- iris %>% dplyr::filter(.data[[x]] < quantiles[1]) # Filter the column with the required quantile
        ggplot(filtered_data, aes_string(x = x)) +
            geom_histogram(position = "identity", aes(y= ..ncount.., fill = Species), bins = 10) +
            theme_classic() + 
            facet_grid(Species~., scales ="free_y") +
            theme(legend.position = 'None',
                        
                        panel.spacing = unit(2, "lines"),
                        legend.title = element_blank(),
                        strip.background = element_blank(),
                        strip.text.y = element_blank(),
                        plot.margin = unit(c(10,10,10,10), "points")
            ) +
            xlab(x) +
            scale_x_continuous() 
    }
    comb <- lapply(colnames(iris[1:4]), function(x) myplot(x, 0.9))
    ggarrange(plotlist = comb, common.legend = TRUE, legend = "bottom", ncol = 2, nrow = 2) 
    

    enter image description here