I always thought that the cld
column in the output of microbenchmark
was a statistical ranking of the speed. However this is not true:
> microbenchmark(
+ intmap = fintmap(), # slower
+ List = flist(),
+ times = 5
+ )
Unit: microseconds
expr min lq mean median uq max neval cld
intmap 793.984 910.539 1145.8608 911.840 1290.529 1822.412 5 a
List 1.092 1.318 201.3712 1.639 3.660 999.147 5 b
So what is it? The doc only says it is a statistical ranking, but of what?
Or maybe it is a multiple comparison test of the speeds but the inequality of the standard deviations can cause such an issue? There's clearly an outlier in the second benchmark.
It seems that my question was not clear. I know the meaning of the letters a
and b
, this is the classical way to report a Tukey test. But the results are not coherent here: intmap
is slower but is ranked first.
the cld
is a Compact Letter Display brought over from the package multcomp
.
From that package: "Equal letters indicate no significant differences."
What I can't currently determine in whether or not it's meant to be ranked or just classified i.e. is a
meant to be generally faster than b
or just different?
The code in microbenchmark::summary
is:
ops <- options(warn=-1)
mdl <- lm(time ~ expr, object)
comp <- multcomp::glht(mdl, multcomp::mcp(expr = "Tukey"))
res$cld <- multcomp::cld(comp)$mcletters$monospacedLetters
So from that, it appears to be generating a linear model lm()
from the raw times (not the means etc), then setting up multiple comparisons object glht()
for all-pair comparisons, then reducing that to a cld using cld()
.
EDIT: Testing ranking:
a <- rnorm(1000)
a
microbenchmark(
alpha = mean(a),
beta = a/length(a) |> sum(),
gamma = sum(a) / length(a),
times = 10000,
unit = "nanoseconds"
)
Unit: nanoseconds
expr min lq mean median uq max neval cld
alpha 4700 5500 6325.56 5700 6800 37700 10000 a
beta 1700 2700 5307.55 2900 3300 12419800 10000 a
gamma 900 1100 1240.32 1100 1300 24000 10000 b
microbenchmark(
gamma = sum(a) / length(a),
alpha = mean(a),
beta = a/length(a) |> sum(),
times = 10000,
unit = "nanoseconds"
Unit: nanoseconds
expr min lq mean median uq max neval cld
gamma 900 1100 1214.29 1100 1200 23700 10000 a
alpha 4900 5500 6039.82 5700 6200 71900 10000 b
beta 1700 2500 5459.20 3000 3200 12272900 10000 b
)
This would appear to demonstrate that, as suspected, the order of the entries in the table is listed as provided to microbenchmark()
, and the cld are assigned sequentially based on this order NOT by the overall speed ranking.
edit 2: playing with ordering
d <- microbenchmark(
alpha = mean(a),
beta = a/length(a) |> sum(),
gamma = sum(a + a - a) / length(a),
times = 10000,
unit = "nanoseconds"
)
print(d, order = "cld")
Unit: nanoseconds
expr min lq mean median uq max neval cld
beta 1700 1900 2386.04 2000 2300 53400 10000 b
alpha 5000 5500 6219.35 5700 6400 72700 10000 a
gamma 1900 2200 4378.53 2400 2600 8532200 10000 ab
Looks to me like it sorts the cld alphabetically as though it were a set of columns, so it sorts by a (blanks at the top) then by b (ditto) etc...