Background: I have data from a simulation where I have a few variables and thus many resulting combinations of parameters. Due to the internal design of the simulation there can be a little variation among the outcomes of identical sets of parameters, so I run a number of identical runs, then calculate their min, max, and mean score. Then, I want to compare the treatment and no-treatment conditions:
This gives me the mean difference but also the bounds of the best- and worst-case comparison.
Example data:
my_data <- tribble(
~params, ~treatment, ~mean_score, ~min_score, ~max_score,
"combo a", 0, 91, 90, 92,
"combo a", 1, 92, 92, 92,
"combo b", 0, 89, 87, 91,
"combo b", 1, 92, 89, 92,
"combo c", 0, 90, 90, 90,
"combo c", 1, 89, 85, 93,
)
Blowing the dust off my R skills, my initial attempt is the following, but I do not know how to tell summarize which row should be subtracted from which within the grouping.
Code attempt I know doesn't work:
my_summ_data <- mydata %>%
dplyr::group_by(params = as.factor(params)) %>%
dplyr::summarize(hier_diff=diff(mean_score),
min_max_diff=diff(c(min_score, max_score)),
max_min_diff=diff(c(max_score, min_score)) )
I would like to get
params | hier_diff | min_max_diff | max_min_diff |
---|---|---|---|
combo a | 1 | 0 | 2 |
combo b | 3 | -2 | 5 |
combo c | -1 | -5 | 3 |
but instead I get (btw I don't yet understand why I get these extra rows)
params | hier_diff | min_max_diff | max_min_diff |
---|---|---|---|
combo a | 1 | 2 | 0 |
combo a | 1 | 0 | -2 |
combo a | 1 | 0 | 2 |
combo b | 1 | 2 | 0 |
combo b | 1 | 2 | -4 |
combo b | 1 | 0 | 2 |
combo c | 2 | -2 | 6 |
combo c | 2 | 2 | -6 |
combo c | 2 | 6 | -2 |
I'm not convinced there is a sensible way to do what I want using summarize. But if there is, I would like to know it, and if not, what is the next best alternative?
my_data %>%
dplyr::group_by(params = as.factor(params)) %>%
dplyr::summarize(
hier_diff= mean_score[treatment==1] - mean_score[treatment==0],
min_max_diff=min_score[treatment==1] - max_score[treatment==0], # EDIT -- removed unneeded min/max
max_min_diff=max_score[treatment==1] - min_score[treatment==0] # EDIT -- removed unneeded min/max
)
Result
# A tibble: 3 x 4
params hier_diff min_max_diff max_min_diff
<fct> <dbl> <dbl> <dbl>
1 combo a 1 0 2
2 combo b 3 -2 5
3 combo c -1 -5 3
Note that the answer is the same even if the treatment rows appear appear prior to the no-treatment rows, eg:
my_data <- tribble(
~params, ~treatment, ~mean_score, ~min_score, ~max_score,
"combo a", 1, 92, 92, 92, # swapped rows 1+2, 3+4, 5+6
"combo a", 0, 91, 90, 92,
"combo b", 1, 92, 89, 92,
"combo b", 0, 89, 87, 91,
"combo c", 1, 89, 85, 93,
"combo c", 0, 90, 90, 90,
)