This is a follow up to Reordering factor gives different results, depending on which packages are loaded, with another, related question.
@Andrie's answer is the correct one and following @David Lovell's comment, I am a third confused soul because of this.
In my case it was because I had loaded ROCR
, which depends on gplots
, which depends on gdata
, and I hadn't even heard of gdata
, to illustrate my ignorance, and therefore didn't think to search for it.
I've discovered another quirk, which made it even more difficult to work out in my case, and is the point of this question. Something about gdata:::reorder.factor
handles integers and numerics differently. To illustrate:
library(gdata)
x <- factor(letters[1:6])
y <- c(1,4,3,5,6,2)
z <- c(1.1,2.4,1.3,2.5,2.6,1.2)
stats:::reorder.default(x, y, function(X)-X) #edbcfa - correct
stats:::reorder.default(x, z, function(X)-X) #edbcfa
stats:::reorder.default(x, -y) #edbcfa
stats:::reorder.default(x, -z) #edbcfa
gdata:::reorder.factor(x, y, function(X)-X) #edbcfa
gdata:::reorder.factor(x, z, function(X)-X) #bdeafc - weird
gdata:::reorder.factor(x, -y) #abcdef - no reordering
gdata:::reorder.factor(x, -z) #abcdef - no reordering
It's mostly the bdeafc that I'm interested in. It has got the bit before the decimal correct, in that the 2.x are before the 1.x, but the part after the decimal point is in normal order, not reverse order: x.1 before x.2 before x.3.
Why is this?
Hm, this seems to be because gdata:::reorder.factor
takes in an argument named sort
which by default has value mixedsort
. This mixedsort
argument uses mixedorder
function from package gtools
. By loading gtools
and doing ?mixedorder
, you can find out why this is happening:
?mixedorder
Order or Sort strings with embedded numbers so that the numbers are in the correct order:
These functions sort or order character strings containing numbers so that the numbers are numerically sorted rather than sorted by character value. I.e. "Asprin 50mg" will come before "Asprin 100mg". In addition, case of character strings is ignored so that "a", will come before "B" and "C".
Also ?reorder.factor
clearly states this:
?gdata:::reorder.factor
If sort is provided (as it is by default): The new factor level names are generated by applying the supplied function to the existing factor level names. With sort=mixedsort the factor levels are sorted so that combined numeric and character strings are sorted in according to character rules on the character sections (including ignoring case), and the numeric rules for the numeric sections. See mixedsort for details.
You'll have to provide a value of NULL to sort
argument so that mixedsort
is not taken by default.
gdata:::reorder.factor(x, z, function(X)-X, sort=NULL)
# [1] a b c d e f
# Levels: e d b c f a
Alternatively, as @BenBolker points out under comments, you can provide "sort" argument as simply sort
:
gdata:::reorder.factor(x, z, function(X)-X, sort=sort)
For the future, debugonce
is your friend for these sort of things. By doing
debugonce(gdata:::reorder.factor)
gdata:::reorder.factor(x, z, function(X)-X)
(and hitting enter and inspecting the output) you can find that the issue comes from the last few lines that are being run:
else if (!missing(FUN))
new.order <- names(sort(tapply(X, x, FUN, ...)))
For your data,
> X
# [1] 1.1 2.4 1.3 2.5 2.6 1.2
> x
# [1] a b c d e f
# Levels: a b c d e f
And, tapply(...)
gives:
> tapply(X, x, FUN, ...)
# a b c d e f
# -1.1 -2.4 -1.3 -2.5 -2.6 -1.2
Here, the "sort" should give:
> base:::sort(tapply(X, x, FUN, ...))
# e d b c f a
# -2.6 -2.5 -2.4 -1.3 -1.2 -1.1
But it gives:
# b d e a f c
# -2.4 -2.5 -2.6 -1.1 -1.2 -1.3
This is because the "sort" that's being called is not from base, which can be seen by typing "sort" from within the debugger:
> sort # from within the function call (using debugonce)
# function (x)
# x[mixedorder(x)]
# <environment: namespace:gtools>
mixedorder
is a function from package gtools
. Since the command fetches the names
and the sorting is wrong, the wrong order is being fetched. So basically the issue is that the sort
that's being called is mixedsort
and not base:::sort
.
It's easy to verify this by installing gtools
and doing:
require(gtools)
gtools:::mixedorder(c(-2.4, -2.5, -2.6))
# [1] 1 2 3
order(c(-2.4, -2.5, -2.6))
# [1] 3 2 1
Therefore, you'll have to provide sort=NULL
to make sure this doesn't happen.