update: problems were caused by typo's
summarize
did not organise output by each group due to a typo in the third line (median_dbp=(diastolic_bp)
should have been median_dbp=median(diastolic_bp)
).drug
because the call to fill=drug
was outside of the aes
mapping but it should have been inside (correct code: ggplot(data=mydata, aes(x=timepoint, y=diastolic_bp, fill=drug))
.for an assignment I have the following data of a small crossover study where two drugs, A and B, are compared for their effect on the diastolic blood pressure (DBP). Each patient in the study receives the two treatments in a random order and separated in time (“wash-out” period) so that one treatment does not influence the blood pressure measurement obtained after administering the other treatment (i.e. to rule out carry-over effect). The data looks as follows:
library(tidyverse)
library(dplyr)
library(lubridate)
library(magrittr)
mydata <- structure(list(pt_id = c(1, 1, 2, 2, 3, 3, 4, 5, 5, 6, 6, 7,
7, 8, 8, 9, 9, 10, 10, 11, 11, 12, 12, 13, 13, 14, 14, 15, 15,
16, 17, 17, 18, 18, 19, 19), timepoint = structure(c(1L, 2L,
1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L,
1L, 2L), .Label = c("Timepoint 1", "Timepoint 2"), class = "factor"),
drug = structure(c(1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 1L,
2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 2L,
1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L), .Label = c("Drug A",
"Drug B"), class = "factor"), diastolic_bp = c(100, 112,
116, 114, 108, 110, 104, 114, 114, 98, 116, 102, 100, 96,
103, 92, 89, 103, 96, 116, 78, 127, 131, 129, 124, 106, 128,
133, 118, 108, 91, 109, 113, 98, 118, 112)), row.names = c(NA,
-36L), class = "data.frame")
My first question is regarding obtaining a mean and standard deviation (as well as a mean + percentiles) for each treatment group per timepint. My code:
mydata %>%
group_by(timepoint, drug) %>%
summarise(mean_dbp=mean(diastolic_bp, na.rm=TRUE),
sd_dbp=sd(diastolic_bp, na.rm=TRUE),
median_dbp=(diastolic_bp),
p25_dbp=quantile(diastolic_bp, probs=0.25),
p75_dbp=quantile(diastolic_bp, probs=0.75))
# This returns a line per patient:
# A tibble: 36 x 7
# Groups: timepoint, drug [4]
timepoint drug mean_dbp sd_dbp median_dbp p25_dbp p75_dbp
<fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Timepoint 1 Drug A 105. 14.1 100 96 108
2 Timepoint 1 Drug A 105. 14.1 108 96 108
3 Timepoint 1 Drug A 105. 14.1 98 96 108
4 Timepoint 1 Drug A 105. 14.1 96 96 108
5 Timepoint 1 Drug A 105. 14.1 92 96 108
6 Timepoint 1 Drug A 105. 14.1 127 96 108
7 Timepoint 1 Drug A 105. 14.1 129 96 108
8 Timepoint 1 Drug A 105. 14.1 106 96 108
9 Timepoint 1 Drug A 105. 14.1 91 96 108
10 Timepoint 1 Drug B 114. 9.64 116 110. 116.
# ... with 26 more rows
But this produces calculations for each row in the dataset. What I was expecting was number one number for each combination of drug
and timepoint
...
Then I tried to make a boxplot per timepoint and group as follows:
ggplot(data=mydata, aes(x=timepoint, y=diastolic_bp), fill=drug) + geom_boxplot()
But this does not include the grouping variable drug
:
Any help?
Perhaps this is what you want. drug
needs to go in aes.
ggplot(data=mydata, aes(x=timepoint, y=diastolic_bp, fill=drug)) + geom_boxplot()