Search code examples
rggplot2histogramaxes

How to set x-axes to the same scale after log-transformation with ggplot


I've been trying for weeks to solve an apparently simple problem while plotting two independent histograms with ggplot. Because the data do not follow a normal distribution, I log-transform them. But then, I am incapable to scale set the X-axis of the independent plots to show the exact same scale.

Here is an example:

#random data:
set.seed(123); g1 <- data.frame(rlnorm(1000, 1, 3))
set.seed(123); g2 <- data.frame(rlnorm(2000, 0.4, 1.2))
colnames(g1) <- "value"; colnames(g2) <- "value"

#plotting g1 in logscale
plot_g1 <- ggplot(g1, aes(x=value)) + 
labs(x = "value", y = "Frequency") +
geom_density(alpha=0.25)+
theme_classic(base_size =25, base_line_size = 0.5)

plot_g1.2 <- ggplot(g1, aes(x=value)) + 
geom_histogram(binwidth=2.5, position = "identity", aes(y=..density..),  alpha = 0.75) +
labs(x = "value", y = "Frequency") +
geom_density(alpha=0.25)+
theme_classic(base_size = 10, base_line_size = 0.5)

plot_g1.2_log <- plot_g1.2 + 
scale_x_continuous(trans="log2", labels = scales::number_format(accuracy = 0.01, decimal.mark = '.'), breaks = c(0, 0.01, 0.1, 1, 10, 100, 10000), limits=c(-100, 20000))
[![plot_g1.2_log][1]][1]

The plots are okay, but each X-axis is on a different scale. I've played with limits, binwidth, and breaks, but I can't make it work.

enter image description here

One solution is to plot both distribution together:

###combining both plots together
g1$cat <- "g1"; g2$cat <- "g2" ; g12 <- rbind(g1,g2)

plot_g12 <- ggplot(g12, aes(x=value, fill = cat, color = cat)) + 
labs(x = "value", y = "Frequency") +
geom_density(alpha=0.25)+
theme_classic(base_size =10, base_line_size = 0.5)

plot_g12.2 <- ggplot(g12, aes(x=value, fill = cat, color = cat)) + 
geom_histogram(binwidth=0.5, position = "identity", aes(y=..density..),  alpha = 0.75) +
labs(x = "value", y = "Frequency") +
geom_density(alpha=0.25)+
theme_classic(base_size = 10, base_line_size = 0.5)

plot_g12.2_log <- plot_g12.2 + 
scale_x_continuous(trans="log2", labels = scales::number_format(accuracy = 0.01, decimal.mark = '.'), breaks = c(0, 0.01, 0.1, 1, 10, 100, 10000), limits=c(-10, 20000))
plot_g12.2_log

enter image description here

But I'd need them separated. If anyone can help me with that, I'd be very grateful.

Best,

L


Solution

  • I think the reason that you're unable to set identical scales is because the lower limit is invalid in log-space, e.g. log2(-100) evaluates to NaN. That said, have you considered facetting the data instead?

    library(ggplot2)
    
    set.seed(123); g1 <- data.frame(rlnorm(1000, 1, 3))
    set.seed(123); g2 <- data.frame(rlnorm(2000, 0.4, 1.2))
    colnames(g1) <- "value"; colnames(g2) <- "value"
    
    df <- rbind(
      cbind(g1, name = "G1"),
      cbind(g2, name = "G2")
    )
    
    ggplot(df, aes(value)) +
      geom_histogram(aes(y = after_stat(density)),
                     binwidth = 0.5) +
      geom_density() +
      scale_x_continuous(
        trans = "log2",
        labels = scales::number_format(accuracy = 0.01, decimal.mark = '.'),
        breaks = c(0, 0.01, 0.1, 1, 10, 100, 10000), limits=c(1e-3, 20000)) +
      facet_wrap(~ name)
    #> Warning: Removed 4 rows containing non-finite values (stat_bin).
    #> Warning: Removed 4 rows containing non-finite values (stat_density).
    #> Warning: Removed 4 rows containing missing values (geom_bar).
    

    Created on 2021-03-20 by the reprex package (v1.0.0)