Search code examples
rdplyrmeanmedian

Descriptive statistics and boxplot for repeated measurements?



update: problems were caused by typo's

  • Question 1: summarize did not organise output by each group due to a typo in the third line (median_dbp=(diastolic_bp) should have been median_dbp=median(diastolic_bp)).
  • Question 2: the boxplot did not group by drug because the call to fill=drug was outside of the aes mapping but it should have been inside (correct code: ggplot(data=mydata, aes(x=timepoint, y=diastolic_bp, fill=drug)).

for an assignment I have the following data of a small crossover study where two drugs, A and B, are compared for their effect on the diastolic blood pressure (DBP). Each patient in the study receives the two treatments in a random order and separated in time (“wash-out” period) so that one treatment does not influence the blood pressure measurement obtained after administering the other treatment (i.e. to rule out carry-over effect). The data looks as follows:

library(tidyverse)
library(dplyr)
library(lubridate)
library(magrittr)

mydata <- structure(list(pt_id = c(1, 1, 2, 2, 3, 3, 4, 5, 5, 6, 6, 7, 
7, 8, 8, 9, 9, 10, 10, 11, 11, 12, 12, 13, 13, 14, 14, 15, 15, 
16, 17, 17, 18, 18, 19, 19), timepoint = structure(c(1L, 2L, 
1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 
1L, 2L), .Label = c("Timepoint 1", "Timepoint 2"), class = "factor"), 
    drug = structure(c(1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 
    2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 
    1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L), .Label = c("Drug A", 
    "Drug B"), class = "factor"), diastolic_bp = c(100, 112, 
    116, 114, 108, 110, 104, 114, 114, 98, 116, 102, 100, 96, 
    103, 92, 89, 103, 96, 116, 78, 127, 131, 129, 124, 106, 128, 
    133, 118, 108, 91, 109, 113, 98, 118, 112)), row.names = c(NA, 
-36L), class = "data.frame")

My first question is regarding obtaining a mean and standard deviation (as well as a mean + percentiles) for each treatment group per timepint. My code:

mydata %>% 
  group_by(timepoint, drug) %>% 
  summarise(mean_dbp=mean(diastolic_bp, na.rm=TRUE), 
            sd_dbp=sd(diastolic_bp, na.rm=TRUE), 
            median_dbp=(diastolic_bp), 
            p25_dbp=quantile(diastolic_bp, probs=0.25), 
            p75_dbp=quantile(diastolic_bp, probs=0.75))

# This returns a line per patient:
# A tibble: 36 x 7
# Groups:   timepoint, drug [4]
   timepoint   drug   mean_dbp sd_dbp median_dbp p25_dbp p75_dbp
   <fct>       <fct>     <dbl>  <dbl>      <dbl>   <dbl>   <dbl>
 1 Timepoint 1 Drug A     105.  14.1         100     96     108 
 2 Timepoint 1 Drug A     105.  14.1         108     96     108 
 3 Timepoint 1 Drug A     105.  14.1          98     96     108 
 4 Timepoint 1 Drug A     105.  14.1          96     96     108 
 5 Timepoint 1 Drug A     105.  14.1          92     96     108 
 6 Timepoint 1 Drug A     105.  14.1         127     96     108 
 7 Timepoint 1 Drug A     105.  14.1         129     96     108 
 8 Timepoint 1 Drug A     105.  14.1         106     96     108 
 9 Timepoint 1 Drug A     105.  14.1          91     96     108 
10 Timepoint 1 Drug B     114.   9.64        116    110.    116.
# ... with 26 more rows

But this produces calculations for each row in the dataset. What I was expecting was number one number for each combination of drug and timepoint...

Then I tried to make a boxplot per timepoint and group as follows:

ggplot(data=mydata, aes(x=timepoint, y=diastolic_bp), fill=drug) + geom_boxplot()

But this does not include the grouping variable drug: enter image description here

Any help?


Solution

  • Perhaps this is what you want. drug needs to go in aes.

    ggplot(data=mydata, aes(x=timepoint, y=diastolic_bp, fill=drug)) + geom_boxplot()
    

    enter image description here