I have a dataset with the following structure:
study | treatment | severity | people_with_pain | sample_size |
---|---|---|---|---|
0001 | paracetamol | 5.0 | 12 | 24 |
0001 | aspirin | 7.0 | 10 | 20 |
0001 | massage | 10.2 | 20 | 21 |
0002 | paracetamol | 6.0 | . | . |
0002 | aspirin | 6.0 | . | . |
0003 | massage | 2.0 | 10 | 25 |
0003 | paracetamol | 3.5 | 10 | 25 |
I would like to evaluate severity
and the proportion of people_with_pain
within sample_size
across pairs of different treatments (e.g. paracetamol-aspirin). When the outcome is continuous (severity
), I am interested in the mean, standard deviation, median and IQR. When the outcome is dichotomous (people_with_pain/sample_size
), I am interested in the average proportion.
For instance, for the paracetamol-aspirin comparison, severity
would be 6.0 (average between study 0001, 6.0, and study 0002, 6.0. Study 0003 does not contribute because it compares massage and paracetamol). The average proportion of people_with_pain
would be 0.5 (study 0001, 22/44. Study 0002 does not contribute as there is no data).
The desired output would as a data.frame per output (e.g. one data.frame for severity
, one for proportion of people_with_pain
) with comparisons as rows (e.g. paracetamol-aspirin
):
comparison | nstudies | severity_mean | severity_sd |
---|---|---|---|
paracetamol-aspirin | 2 | 6.0 | 0 |
paracetamol-massage | 2 | 4.375 | 2.30 |
aspirin-massage | 0 | NA | NA |
I have managed to build a permutation list of available comparison and exclude those duplicated (e.g. paracetamol-aspirin and aspirin-paracetamol), as well as those not meaningful (e.g. paracetamol-paracetamol), but I am stuck there.
Here is my solution using tidyverse. I am not completely sure that this is what you want because of some differences between my results and your expected results.
However, to obtain all the combinations of treatment
I've used a left_join
of the dataframe with itself, followed by filter(treatment.x < treatment.y)
which basically eliminates all the treatment pair duplicates.
library(dplyr)
library(tidyr)
dat |>
left_join(dat, by = "study") |>
filter(treatment.x < treatment.y) |>
unite("treatment", starts_with("treatment"), sep = "-") |>
group_by(treatment) |>
summarize({
severity <- c(severity.x, severity.y)
people_with_pain <- c(people_with_pain.x, people_with_pain.y)
sample_size <- c(sample_size.x, sample_size.y)
data.frame(severity_mean = mean(severity),
severity_sd = sd(severity),
severity_median = median(severity),
severity_IQR = IQR(severity),
people_with_pain = sum(people_with_pain)/sum(sample_size),
nstudies = length(unique(study)))
})
##> treatment severity_mean severity_sd severity_median severity_IQR
##> 1 aspirin-massage 8.600 2.2627417 8.60 1.600
##> 2 aspirin-paracetamol 6.000 0.8164966 6.00 0.500
##> 3 massage-paracetamol 5.175 3.5668614 4.25 3.175
##> people_with_pain nstudies
##> 1 0.7317073 1
##> 2 NA 2
##> 3 0.5473684 2