Search code examples
rp-valuearticle

How to show p-value and axis value correctly in R?


trying to submit an article with statistical data and figures. I like R and used it for that. I wrote this one

graph_a=
  ggplot(df, aes(x = group, y = squareBowmansCapsule, fill = isInfected)) +
  geom_boxplot() +
  xlab(label = "Groups") +
  ylab(label = "Bowman's capsule area sq.µm") +
  scale_fill_discrete(name = "") +
  theme(axis.text.x = element_text(size = 10),axis.title.x = element_text(size = 10)) +
  theme(axis.text.y = element_text(size = 10),axis.title.y = element_text(size = 10))+
  theme(axis.text.x = element_blank())+
  labs (title = "Bowman's capsule area") +
  theme(legend.position = c(0.1, 1),
        legend.direction = "vertical")

graph_a1 = graph_a +
  annotate(
    "text",
    x = c(1, 2, 3, 4),
    y = -1.5,
    label = c("Control", "1 group", "2 group", "3 group")
  )
graph_a_with_pValue1=add_pval(graph_a1, 
                 pairs = list(c(1, 2),c(1,3),c(1,4)
                 ),
                 test='kruskal.test',heights=c(14000,16000,18500))

and got the figure like this: figure

editors remarks were: 1 Please use commas to separate thousands for numbers with five or more digits (not four digits) in the picture, e.g., “10000” should be “10,000”. 2.Please change the terms into scientific notations in the figure, e.g., “2 × 10−16”, not “2e−16”. 3.Please change P in lower case.

I solved the problem with the following steps

  1. manually show in each figure the needed scale with this code
scale_y_continuous(labels = c("0", "5000", "10,000", "15,000", "20,000")) 

then, mannualy added annotations

pval_annotations =  c("'p = 2 × 10⁻¹³'",
                      "'p < 2 × 10⁻¹⁶'",
                      "'p = 4.7 × 10⁻¹¹'")

graph_a_with_pValue = add_pval(
  graph_a1,
  textsize = 8,
  annotation = pval_annotations,
  pairs = list(c(1, 2), c(1, 3), c(1, 4)),
  heights = c(14000, 16000, 18500)
)

finally, I got this code

graph_a =
  ggplot(df, aes(x = group, y = squareBowmansCapsule, fill = isInfected)) +
  geom_boxplot() +
  xlab(label = "Groups") +
  ylab(label = "Bowman's capsule area sq.µm") +
  scale_fill_discrete(name = "") +
  theme(
    axis.text.x = element_text(size = 12, face = "bold"),
    axis.title.x = element_text(size = 14, face = "bold")
  ) +
  theme(
    axis.text.y = element_text(size = 12, face = "bold"),
    axis.title.y = element_text(size = 14, face = "bold")
  ) +
  scale_y_continuous(labels = c("0", "5000", "10,000", "15,000", "20,000")) +
  theme(axis.text.x = element_blank()) +
  labs (title = "Bowman's capsule area") +
  theme(legend.position = c(0.1, 1),
        legend.direction = "vertical")



graph_a1 = graph_a +
  annotate(
    "text",
    x = c(1, 2, 3, 4),
    y = -1.5,
    label = c("Control", "1 group", "2 group", "3 group")
  )

pval_annotations =  c("'p = 2 × 10⁻¹³'",
                      "'p < 2 × 10⁻¹⁶'",
                      "'p = 4.7 × 10⁻¹¹'")


graph_a_with_pValue = add_pval(
  graph_a1,
  textsize = 8,
  annotation = pval_annotations,
  pairs = list(c(1, 2), c(1, 3), c(1, 4)),
  heights = c(14000, 16000, 18500)
)

and that figureresult

My question is: how to get the same result without so much effort?


Solution

  • Two helper functions, both using scales:: as a starter:

    mycomma <- function(z) {
      out <- scales::label_comma()(z)
      sub("^([0-9]),([0-9]{3})$", "\\1\\2", out)
    }
    myscientific <- function(z) {
      out <- scales::label_scientific()(z)
      out <- parse(text = sub("e", "%*% 10^", out))
      out[abs(z) < 1e-99] <- "0" # otherwise we see "0 x 10^+0"
      out
    }
    

    The use of abs(z) < 1e-99 may be sensitive to your actual data. The intent of that vice z == 0 is to work around Why are these numbers not equal? (and R FAQ 7.31), especially knowing we're dealing with high-precision near-zero numbers. With this sample data (in my hasty testing), it worked as close to zero as abs(z) < 1e-323, while 1e-324 showed 0 x 10^+0.

    Sample data:

    dat <- data.frame(x=seq(2000, 15000, length.out = 5), y=seq(2e-10, 2e-5, length.out = 5))
    

    A plot, using our two helper functions in the labels= argument for each axis:

    library(ggplot2)
    ggplot(dat, aes(x, y)) +
      geom_point() +
      scale_x_continuous(labels = mycomma) +
      scale_y_continuous(labels = myscientific)
    

    ggplot with custom scientific and comma labelling