I have a plot where I am comparing several (around 12) unrelated descriptors. To facilitate the display of all these plots, I make a list:
comb <- lapply(colnames(iris[1:4]), function(x) ggplot(iris, aes(x = get(x))) +
geom_histogram(position = "identity", aes(y= ..ncount.., fill = Species), bins = 10) +
theme_classic() +
facet_grid(Species~., scales ="free_y") +
theme(legend.position = 'None',
panel.spacing = unit(2, "lines"),
legend.title = element_blank(),
strip.background = element_blank(),
strip.text.y = element_blank(),
plot.margin = unit(c(10,10,10,10), "points")
xlab(x) +
which I use with the ggarrange function
ggarrange(plotlist = comb, common.legend = TRUE, legend = "bottom", ncol = 2, nrow = 2)
to create a plot which suits my needs:
However, some of my data have some extreme outliers. I am therefore in need of creating plots which displays 90% quantile data of each column in my dataframe.
I would like to implement a solution which would be similar to the one presented by Warner in this question: (show only 0-90% or 0-95% percentile) , but I am unable to properly implement this solution with what I have. What I am looking for is a way to apply the information obtained from the line:
quantiles <- lapply(iris, quantile, c(0, 0.9)) # find 90% quantiles for all columns
so that only the 90th percentile data is displayed in the lapply function above.
I think you want to remove data above the 90th percentile and plot what remains. Here's some code to do this. I moved the code to a separate function to make it easier to debug and a made the quantile value a parameter to make it easy to change. I also used aes_string
in the ggplot
call instead of needing to use get
myplot <- function(x, q) {
data <- iris %>% dplyr::select(x) # Select the column of interest
quantiles <- quantile(data[,1], q) # Calculate the required quantile
filtered_data <- iris %>% dplyr::filter(.data[[x]] < quantiles[1]) # Filter the column with the required quantile
ggplot(filtered_data, aes_string(x = x)) +
geom_histogram(position = "identity", aes(y= ..ncount.., fill = Species), bins = 10) +
theme_classic() +
facet_grid(Species~., scales ="free_y") +
theme(legend.position = 'None',
panel.spacing = unit(2, "lines"),
legend.title = element_blank(),
strip.background = element_blank(),
strip.text.y = element_blank(),
plot.margin = unit(c(10,10,10,10), "points")
) +
xlab(x) +
comb <- lapply(colnames(iris[1:4]), function(x) myplot(x, 0.9))
ggarrange(plotlist = comb, common.legend = TRUE, legend = "bottom", ncol = 2, nrow = 2)