I have a data set containing the following information:
Here you have a toy data set to illustrate my problem (performance data does not make sense at all, I just selected different integers to make the example easy to follow. In reality that data would be floating point values coming from performance measurements):
workload cfg perf
1 a 1 1
2 b 1 2
3 a 2 3
4 b 2 4
5 a 3 5
6 b 3 6
7 a 4 7
8 b 4 8
You can generate it using:
dframe <- data.frame(workload=rep(letters[1:2], 4),
cfg=unlist(lapply(seq_len(4),
function(x) { return(c(x, x)) })),
perf=round(seq_len(8))
)
I am trying to compute the harmonic speedup for the different configurations. For that a base configuration is needed (cfg = 1 in this example). Then the harmonic speedup is computed as:
num_workloads
HS(cfg_i) = num_workloads / sum (perf(cfg_base, wl_j) / perf(cfg_i, wl_j))
wl_j
For instance, for configuration 2 it would be:
HS(cfg_2) = 2 / [perf(cfg_1, wl_1) / perf(cfg_2, wl_1) +
perf(cfg_1, wl_2) / perf_cfg_2, wl_2)]
I would like to compute harmonic speedup for every workload pair and configuration. By using the example data set, the result would be:
workload.pair cfg harmonic.speedup
1 a-b 1 2 / (1/1 + 2/2) = 1
2 a-b 2 2 / (1/3 + 2/4) = 2.4
3 a-b 3 2 / (1/5 + 2/6) = 3.75
4 a-b 4 2 / (1/7 + 2/8) = 5.09
I am struggling with aggregate
and ddply
in order to find a solution that does not uses loops, but I have not been able to come up with a working solution. So, the basic problems that I am facing are:
I do not really know how to express that with some R function, such as aggregate
or ddply
(if it is possible, at all).
Does anyone know how this can be solved?
EDIT: I was somehow afraid that using 1..8 as perf
could lead to some confusion. I did that for the sake of simplicity, but the values do not need to be those ones (for instance, imagine initializing them like this: dframe$perf <- runif(8)
). Both James and Zach's answers understood that part of my question wrong, so I thought it was better to clarify this in the question. Anyway, I generalized both answers to deal with the case where performance for configuration 1 is not (1, 2)
Try this:
library(plyr)
baseline <- dframe[dframe$cfg == 1,]$perf
hspeed <- function(x) length(x) / sum(baseline / x)
ddply(dframe,.(cfg),summarise,workload.pair=paste(workload,collapse="-"),
harmonic.speedup=hspeed(perf))
cfg workload.pair harmonic.speedup
1 1 a-b 1.000000
2 2 a-b 2.400000
3 3 a-b 3.750000
4 4 a-b 5.090909