Search code examples
rggplot2scatter-plotboxplotjitter

2 things: How to change y-axis values to something more manageable whilst retaining log scale? How to overlay a scatter plot of the data over the box?


(see image in link for better explanation)

Trying to plot a log boxplot. I am very new to R and have tried to read tutorials but they all seem to use a different plotting function?

1/ I would like to know how to change y-axis values (i.e. to 0.001, 0.01, 0.1, 1 etc.) whilst retaining log scale?

2/ I would also like to know how to overlay a scatter plot of the data over the box?

3/ Finally, advice on how to add gridlines and border, of chosen weight and colour, and axis titles would be great?

So far, only code used is:

boxplot(box,
        varwidth = TRUE, log = "y", las = 1)

Sorry it's so obvious but thanks guys!

Reproducible: (first 30 data point)

structure(list(CD = c(0.291998350286, 58.4266839332, 1.27227891359, 
7.05106388302, 0.000175203165079, 14.5665189804, 0.991317477169, 
1.56817217741, 30.4733699427, 0.421737157934, 1.42372160368, 
0.333712081068, 0.126643859356, 0.339337851064, 0.151788605996, 
3.81711532569, 1.54344215823, 17.2540240816, 3.67548135199, 4.08331544672, 
0.0549081111653, 0.0734888395127, 5.16751927204, 22.6971132167, 
1.04321972985, 0.184343635879, 2.29291935133, 0.0555342051937, 
0.411328596454, 51.3157360015), WD = c(0.402162969955, 0.189544929529, 
0.000840280055822, 0.0501429051167, 3.4853343866, 0.0286017538011, 
0.0121948073037, 0.992426638872, 0.0192559537415, 0.00398698494632, 
0.888543226817, 0.703331842713, 0.378008558951, 4.70639786908, 
0.113706495683, 1.32546254378, 0.936899368015, 0.108969215053, 
0.25593198462, 0.564518000036, 0.121389166752, 0.195884521759, 
0.704964462359, 1.25602965005, 0.0242662609253, 2.11883481514, 
0.44581781826, 0.659586439033, 0.36869665263, 0.824802234027), 
    MC = c(0.0817800846374, 1.70562818122, 0.0807325401412, 0.180484111266, 
    0.0438908620273, 8.75617400342, 0.479370274286, 0.908307567192, 
    2.81446961622, 0.0699990348088, 0.0491805903311, 0.00573142245572, 
    0.116352754956, 0.311847695137, 0.0414215549125, 0.104499713126, 
    0.0551723673287, 0.076199002014, 0.191940770942, 4.11745930602, 
    1.75751348869, 0.0517694407553, 2.29459310871, 0.0269233884783, 
    0.097992042257, 11.7325079183, 0.262543381616, 0.748125397347, 
    0.635821595694, 0.794256126423), WC = c(0.0686062258206, 
    0.514240129693, 7.68226019254, 4.36776848419, 0.618214352027, 
    2.13911888244, 0.0392505689889, 0.0823059942863, 2.36466448826, 
    0.0688590035687, 0.151457824484, 0.260629997743, 8.30460664472, 
    0.235838508742, 0.41960151168, 4.38818043685, 0.0797918590848, 
    0.109025596179, 0.0837286212892, 0.0117251770506, 1.17739717792, 
    0.207413909376, 8.62180088733, 2.33021344099, 0.166981061366, 
    1.13410263425, 0.0905601584251, 0.154075808752, 0.140498581833, 
    0.213863468391), MWC = c(301.891645135, 0.672405306137, 0.105110378336, 
    5.36947765018, 0.672138277335, 3.58296467263, 10.7754596083, 
    5.01795685162, 0.0775842457366, 1.07683084271, 1.0360624974, 
    16.8763517534, 0.390002867544, 1.50618637339, 0.371973397842, 
    1.28366689573, 0.0633246500391, 0.0364964802158, 0.249895194073, 
    0.0379084221473, 0.0798275709535, 0.504735639066, 8.12262202509, 
    82.5787360252, 0.068574731873, 8.76779568117, 0.00873932360562, 
    0.0142029221366, 0.0228083224849, 0.146073745479)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -30L))
>

Solution

  • Lots of questions in one here, which really boil down to "how to use ggplot2". Here's a good introductory guide.

    First, your data are in "wide" format, ggplot2 works better with "long" format (one column for data names, one for their values). We can use tidyr::pivot_longer() for that. By default it generates new columns name and value.

    For a boxplot we use geom_boxplot(). By "scatter plot" I think you mean "jitter plot", which is the usual way to overlay individual data points on a boxplot. The appropriate function is geom_jitter().

    Labels for y-axis values can be altered in several different ways. One is to use functions from the scales package. Another is to supply a labelling function - see the code below.

    Axis titles can be added using the labs() function.

    Gridlines and border of chosen weight and color: well, it depends what you want exactly, but in general you would use theme() and look for arguments related to panel. In the example code below we add a thick red border.

    So putting all of that together:

    library(ggplot2)
    library(tidyr)
    library(dplyr)
    
    box %>% 
      pivot_longer(everything()) %>% 
      ggplot(aes(name, value)) + 
      geom_boxplot(outlier.shape = NA) + 
      geom_jitter(width = 0.2) + 
      scale_y_log10(labels = function(x) format(x, scientific = FALSE)) +     
      theme_bw() + 
      theme(panel.border = element_rect(fill = NA, color = "red", size = 2)) +  
      labs(x = "Group", y = "Value")
    

    Result. Hope that helps you to get started.

    enter image description here