Search code examples
rggplot2data-visualizationmedian

How to turn a table into a ggplot graph?


I'm currently rewriting an article some person wrote some time ago and could not follow along with it. This article has a table named "Table 2: median and interquartile range of severity of depressive symptoms and serum levels of interleukin-6 and tumor necrosis factor at baseline, after intervention and at 6 and 12 months of follow-up.".

This is the table I'm talking about.

enter image description here

The data frame is currently variables with the values of each subject. Like il6_baseline, il6_6mon, il6_12mon, il6_after (for interleukin-6). The same thing with the tumor necrosis factor. These are continuous variables.

And the "TCC" AND "PDSE" in the table are two different groups that had different treatments in that period.

But I know how to get the medians and all that. My problem is what kind of graph do I use to illustrate these informations visually the best way? And if you could help me with a basic syntax that I could work from there. I'm a fresh learner of R, I can usually get stuff done, but I never messed much with graphs, and now I have this obstacle in front of me.

Thanks for your comprehension and attention. Have a good day!

Output from dput of a subset for visualization:

structure(list(a02rec = c(2925, 2461, 2887, 4132, 2734, 4176, 
2158, 690, 4287, 2871), ND_IL_6I = c(156.475, 25.393, 5.20696, 
29.448, 636.561, 16.7, 20.83028, 13.04912, 17.28, 30.686), ND_IL6_intermed = c(NA, 
NA, NA, NA, NA, 4.5048, 49.654, 5.1872, 23.8992, NA), IL_6_6mesesultimovalorITT = c(62.163, 
59.278, 45.1272, 19.258, 17.689, 15.864, 16.0992, 22.88964, 14.748, 
21.706), modeloterapia = structure(c(2L, 1L, 1L, 2L, 1L, 1L, 
2L, 2L, 1L, 2L), .Label = c("pdse", "tcc"), class = "factor")), row.names = c(NA, 
10L), class = "data.frame")

In this subset above, the "a02rec" variable does not matter, it is just an identifier. The variables starting "IL_6" and "ND_IL_6" are the ones from the collected serum levels, and the "modeloterapia" variable is about whether the subject attended PSDE or TCC therapy model. I wanted to create a graph as I said in the previous comment. Three graphs, one for each group (PSDE, TCC and total sample), and have some kind of boxes showing the serum levels between these timestamps.

I'm not sure if it would be better to show in a "box-like" graph or a "point/dot-like" graph what I want to achieve. I'd like the graphs to demonstrate change in serum levels between periods (initial/baseline, 6 months, 12 months and after treatment).


Solution

  • They key for using ggplot2 effectively is converting your data to a long format.

    long_data = df %>%
      pivot_longer(matches("IL")) %>%
      separate(name, sep = "_", into = c("drug", "something", "time"))
    
    head(long_data)
    # # A tibble: 6 x 6
    #   a02rec modeloterapia drug  something time                 value
    #    <dbl> <fct>         <chr> <chr>     <chr>                <dbl>
    # 1   2925 tcc           ND    IL        6I                   156. 
    # 2   2925 tcc           ND    IL6       intermed              NA  
    # 3   2925 tcc           IL    6         6mesesultimovalorITT  62.2
    # 4   2461 pdse          ND    IL        6I                    25.4
    # 5   2461 pdse          ND    IL6       intermed              NA  
    # 6   2461 pdse          IL    6         6mesesultimovalorITT  59.3
    

    I'm at a loss about some of the meaning of your data, but have taken guesses that hopefully you can correct.

    With data in that format, plotting is relatively straightforward. Here is an example (which looks a little weird due to the small sample of data).

    ggplot(long_data, aes(x = time, y = value, fill = drug)) +
      geom_boxplot() + 
      facet_wrap(vars(modeloterapia), ncol = 1)
    

    enter image description here

    If you need to reorder the x-axis, turn the variable on the x-axis into a factor with the levels in the order you want, as in this answer.