I'm currently rewriting an article some person wrote some time ago and could not follow along with it. This article has a table named "Table 2: median and interquartile range of severity of depressive symptoms and serum levels of interleukin-6 and tumor necrosis factor at baseline, after intervention and at 6 and 12 months of follow-up.".
This is the table I'm talking about.
The data frame is currently variables with the values of each subject. Like il6_baseline, il6_6mon, il6_12mon, il6_after (for interleukin-6). The same thing with the tumor necrosis factor. These are continuous variables.
And the "TCC" AND "PDSE" in the table are two different groups that had different treatments in that period.
But I know how to get the medians and all that. My problem is what kind of graph do I use to illustrate these informations visually the best way? And if you could help me with a basic syntax that I could work from there. I'm a fresh learner of R, I can usually get stuff done, but I never messed much with graphs, and now I have this obstacle in front of me.
Thanks for your comprehension and attention. Have a good day!
Output from dput of a subset for visualization:
structure(list(a02rec = c(2925, 2461, 2887, 4132, 2734, 4176,
2158, 690, 4287, 2871), ND_IL_6I = c(156.475, 25.393, 5.20696,
29.448, 636.561, 16.7, 20.83028, 13.04912, 17.28, 30.686), ND_IL6_intermed = c(NA,
NA, NA, NA, NA, 4.5048, 49.654, 5.1872, 23.8992, NA), IL_6_6mesesultimovalorITT = c(62.163,
59.278, 45.1272, 19.258, 17.689, 15.864, 16.0992, 22.88964, 14.748,
21.706), modeloterapia = structure(c(2L, 1L, 1L, 2L, 1L, 1L,
2L, 2L, 1L, 2L), .Label = c("pdse", "tcc"), class = "factor")), row.names = c(NA,
10L), class = "data.frame")
In this subset above, the "a02rec" variable does not matter, it is just an identifier. The variables starting "IL_6" and "ND_IL_6" are the ones from the collected serum levels, and the "modeloterapia" variable is about whether the subject attended PSDE or TCC therapy model. I wanted to create a graph as I said in the previous comment. Three graphs, one for each group (PSDE, TCC and total sample), and have some kind of boxes showing the serum levels between these timestamps.
I'm not sure if it would be better to show in a "box-like" graph or a "point/dot-like" graph what I want to achieve. I'd like the graphs to demonstrate change in serum levels between periods (initial/baseline, 6 months, 12 months and after treatment).
They key for using ggplot2
effectively is converting your data to a long format.
long_data = df %>%
pivot_longer(matches("IL")) %>%
separate(name, sep = "_", into = c("drug", "something", "time"))
head(long_data)
# # A tibble: 6 x 6
# a02rec modeloterapia drug something time value
# <dbl> <fct> <chr> <chr> <chr> <dbl>
# 1 2925 tcc ND IL 6I 156.
# 2 2925 tcc ND IL6 intermed NA
# 3 2925 tcc IL 6 6mesesultimovalorITT 62.2
# 4 2461 pdse ND IL 6I 25.4
# 5 2461 pdse ND IL6 intermed NA
# 6 2461 pdse IL 6 6mesesultimovalorITT 59.3
I'm at a loss about some of the meaning of your data, but have taken guesses that hopefully you can correct.
With data in that format, plotting is relatively straightforward. Here is an example (which looks a little weird due to the small sample of data).
ggplot(long_data, aes(x = time, y = value, fill = drug)) +
geom_boxplot() +
facet_wrap(vars(modeloterapia), ncol = 1)
If you need to reorder the x-axis, turn the variable on the x-axis into a factor
with the levels in the order you want, as in this answer.