Search code examples
rparsingggplot2subscriptchemistry

Correct display of chemical formulae in ggplot axis category labels


I'm plotting a data set with chemical formulae as categories, and values associated with each:

data <- data.frame(compound = factor(c("SiO[2]", "Al[2]O[3]", "CaO")),
                value = rnorm(3, mean = 1, sd = 0.25))

I want to get the subscripts in the chemical formulae to display correctly in the axis labels. I've tried various solutions involving bquote(), label_parsed(), scales::parse_format() and ggplot2:::parse_safe (as per this thread), but all of those give me either no category labels at all or a mess. For example:

ggplot(data = data, aes(x = compound, y = value)) +
geom_col() +
scale_x_discrete(labels = scales::parse_format()) 

Gives this error message:

Error in parse(text = x, srcfile = NULL) : 1:6: unexpected symbol
1: Al[2]O
         ^

Can anyone help? I've done this successfully before with the x axis and x-axis labels (via labs() and then bquote() or similar), and there are various threads I can see for that problem, but the same solutions don't seem to work for category labels.


Solution

  • UPDATED: Finally got the right parse() routine, so that if the chemicals are formatted correctly already in the dataframe, then they can simply be parsed to show the proper labels. (Note that aluminum oxide needs the tilde (~) character).

    library(tidyverse)
    library(rlang)
    #> 
    #> Attaching package: 'rlang'
    #> The following objects are masked from 'package:purrr':
    #> 
    #>     %@%, as_function, flatten, flatten_chr, flatten_dbl, flatten_int,
    #>     flatten_lgl, flatten_raw, invoke, list_along, modify, prepend,
    #>     splice
    compounds = c("SiO[2]", "Al[2]~O[3]", "CaO[1]")
    data <- tibble(compound = compounds,
                   value = rnorm(3, mean = 1, sd = 0.25))
    data %>%
      ggplot(aes(x = compound, y = value)) +
      geom_col() +
      scale_x_discrete(labels = rlang::parse_exprs)
    

    Created on 2019-11-21 by the reprex package (v0.3.0)


    PREVIOUS UPDATE: Replacing the code with something slightly more extensible with a translation table to obtain the bquote() expressions. Same basic idea, but not just hard-wiring in the labels now, so should work with filters, facets, etc.

    
    library(tidyverse)
    compounds = c("SiO[2]", "Al[2]O[3]", "CaO[1]")
    translation = c("SiO[2]" = bquote(SiO[2]),
                    "Al[2]O[3]" = bquote(Al[2] ~ O[3]),
                    "CaO[1]" = bquote(CaO))
    data <- tibble(compound = compounds,
                   value = rnorm(3, mean = 1, sd = 0.25))
    ggplot(data = data, aes(x = compound, y = value)) +
      geom_col() + 
      scale_x_discrete(labels = translation)