Search code examples
rggplot2boxplotscaletransform

Transform only one axis to log10 scale with ggplot2


I have the following problem: I would like to visualize a discrete and a continuous variable on a boxplot in which the latter has a few extreme high values. This makes the boxplot meaningless (the points and even the "body" of the chart is too small), that is why I would like to show this on a log10 scale. I am aware that I could leave out the extreme values from the visualization, but I am not intended to.

Let's see a simple example with diamonds data:

m <- ggplot(diamonds, aes(y = price, x = color))

alt text

The problem is not serious here, but I hope you could imagine why I would like to see the values at a log10 scale. Let's try it:

m + geom_boxplot() + coord_trans(y = "log10")

alt text

As you can see the y axis is log10 scaled and looks fine but there is a problem with the x axis, which makes the plot very strange.

The problem do not occur with scale_log, but this is not an option for me, as I cannot use a custom formatter this way. E.g.:

m + geom_boxplot() + scale_y_log10() 

alt text

My question: does anyone know a solution to plot the boxplot with log10 scale on y axis which labels could be freely formatted with a formatter function like in this thread?


Editing the question to help answerers based on answers and comments:

What I am really after: one log10 transformed axis (y) with not scientific labels. I would like to label it like dollar (formatter=dollar) or any custom format.

If I try @hadley's suggestion I get the following warnings:

> m + geom_boxplot() + scale_y_log10(formatter=dollar)
Warning messages:
1: In max(x) : no non-missing arguments to max; returning -Inf
2: In max(x) : no non-missing arguments to max; returning -Inf
3: In max(x) : no non-missing arguments to max; returning -Inf

With an unchanged y axis labels:

alt text


Solution

  • The simplest is to just give the 'trans' (formerly 'formatter') argument of either the scale_x_continuous or the scale_y_continuous the name of the desired log function:

    library(ggplot2)  # which formerly required pkg:plyr
    m + geom_boxplot() + scale_y_continuous(trans='log10')
    

    EDIT: Or if you don't like that, then either of these appears to give different but useful results:

    m <- ggplot(diamonds, aes(y = price, x = color), log="y")
    m + geom_boxplot() 
    m <- ggplot(diamonds, aes(y = price, x = color), log10="y")
    m + geom_boxplot()
    

    EDIT2 & 3: Further experiments (after discarding the one that attempted successfully to put "$" signs in front of logged values):

    # Need a function that accepts an x argument
    # wrap desired formatting around numeric result
    fmtExpLg10 <- function(x) paste(plyr::round_any(10^x/1000, 0.01) , "K $", sep="")
    
    ggplot(diamonds, aes(color, log10(price))) + 
      geom_boxplot() + 
      scale_y_continuous("Price, log10-scaling", trans = fmtExpLg10)
    

    alt text

    Note added mid 2017 in comment about package syntax change:

    scale_y_continuous(formatter = 'log10') is now scale_y_continuous(trans = 'log10') (ggplot2 v2.2.1)