I am trying to find an R code to make a plot that looks like this (below). I know this figure was created in python because I emailed the person who created this code but after 2 months of back and forth with him promising to get me the code, he ghosted me. I also tried contacting my Data Analytics Core and tried searching for a code I could build off of but didn't find anything even close to what I am trying to accomplish so I decided to ask the hive mind. I've used python in the past so I could manage following a python script but I primarily use R to plot data.
The publication this figure is from- https://pubmed.ncbi.nlm.nih.gov/36599299/
The data I plan to use to create the plot is circadian data.
It contains: sample = three datasets (BAT, WAT and in vitro). process = the process that was found to be circadianly regulated (i.e. the thing being measured). ct = the timing that subprocesses happen during the circadian day with the day having 24hrs and wrapping around at 24/0h.
Here is a sample of my data for "BAT" - sample process ct
sample process ct
BAT CellCycle 1.140128762
BAT CellCycle 3.106516768
BAT CellCycle 3.738113578
BAT CellCycle 23.44912305
BAT CellCycle 23.49744379
BAT CellCycle 23.51422441
BAT CellCycle 23.33041125
BAT CellCycle 22.19656322
BAT CellCycle 22.84183477
BAT ECM 22.04149961
BAT ECM 23.45526633
BAT ECM 23.36802574
BAT ECM 23.40688727
BAT ECM 1.732071528
BAT ECM 1.529777828
BAT ECM 20.29377917
BAT Energy 2.19647636
BAT Energy 4.177766894
BAT Energy 2.624942744
BAT Energy 1.633080723
BAT Energy 2.25815373
BAT Energy 22.52527935
BAT Energy 21.844178
BAT Energy 1.44484992
BAT Energy 1.710318176
BAT Energy 1.721398481
BAT Energy 5.0700794
BAT Energy 3.252122282
BAT Energy 19.69284419
BAT Energy 2.521538024
BAT Energy 2.309621407
BAT Energy 1.979057875
BAT Energy 1.747326389
BAT Energy 1.620003554
BAT Energy 4.512535826
BAT Energy 19.48505391
BAT Energy 3.96613478
There will ultimately be one plot per sample (3 plots) showing the phase distribution of each process with the median and IQR depicted.
It's difficult to know with only a small sample of your data, but here's the basic principle of how you could do it in R using ggplot
:
library(tidyverse)
df %>%
mutate(ct = ifelse(ct > 12, ct - 24, ct)) %>%
ggplot(aes(as.numeric(factor(process)), ct, fill = process)) +
geom_vline(xintercept = 1:3, linewidth = 7, color = 'gray92') +
stat_boxplot(geom = 'errorbar', width = 0.6, aes(color = process)) +
geom_boxplot(width = 0.6, linewidth = 0, outlier.colour = NA) +
stat_boxplot(geom = 'errorbar', width = 0.6,
aes(ymax = after_stat(middle), ymin = after_stat(middle))) +
geom_hline(yintercept = c(-12, -8, -4, 0, 4, 8), linetype = 5,
alpha = 0.2) +
annotate('rect', xmax = -2, xmin = -Inf, ymin = -Inf, ymax = Inf,
fill = 'white') +
annotate('text', x = -4, y = 0, label = 'BAT', fontface = 2, size = 4) +
coord_polar(theta = 'y', start = pi) +
theme_void() +
scale_x_continuous(limits = c(-4, 3.3)) +
scale_y_continuous(limits = c(-12, 12), breaks = -12:11,
labels = ~sprintf('%02d:00', ifelse(.x < 0, .x + 24,.x))) +
theme(axis.text.x = element_text(),
legend.position = 'top') +
scale_color_manual(values = c('#ca6c28', '#aa2a25', '#538335')) +
scale_fill_manual(values = c('#ca6c28', '#aa2a25', '#538335'))
You will need to be careful to pick a safe 'wrap-around' time for each plot such that none of your samples have a range that crosses the wrap-around time. In your sample data, 12 midday is a safe time because none of your samples' ranges cross midday.
Data taken from question in reproducible format
df <- structure(list(sample = c("BAT", "BAT", "BAT", "BAT", "BAT",
"BAT", "BAT", "BAT", "BAT", "BAT", "BAT", "BAT", "BAT", "BAT",
"BAT", "BAT", "BAT", "BAT", "BAT", "BAT", "BAT", "BAT", "BAT",
"BAT", "BAT", "BAT", "BAT", "BAT", "BAT", "BAT", "BAT", "BAT",
"BAT", "BAT", "BAT", "BAT", "BAT"), process = c("CellCycle",
"CellCycle", "CellCycle", "CellCycle", "CellCycle", "CellCycle",
"CellCycle", "CellCycle", "CellCycle", "ECM", "ECM", "ECM", "ECM",
"ECM", "ECM", "ECM", "Energy", "Energy", "Energy", "Energy",
"Energy", "Energy", "Energy", "Energy", "Energy", "Energy", "Energy",
"Energy", "Energy", "Energy", "Energy", "Energy", "Energy", "Energy",
"Energy", "Energy", "Energy"), ct = c(1.140128762, 3.106516768,
3.738113578, 23.44912305, 23.49744379, 23.51422441, 23.33041125,
22.19656322, 22.84183477, 22.04149961, 23.45526633, 23.36802574,
23.40688727, 1.732071528, 1.529777828, 20.29377917, 2.19647636,
4.177766894, 2.624942744, 1.633080723, 2.25815373, 22.52527935,
21.844178, 1.44484992, 1.710318176, 1.721398481, 5.0700794, 3.252122282,
19.69284419, 2.521538024, 2.309621407, 1.979057875, 1.747326389,
1.620003554, 4.512535826, 19.48505391, 3.96613478)), row.names = c(NA,
-37L), class = "data.frame")