Is it feasible to test whether some dataframe is simply a sorted version of another dataframe? For example, if I have two dataframes a
and b
, is there some way to easily determine whether a
is simply a reordered version of b
(or vice versa)?
Here's a trivial example:
a <- data.frame(x1=1:10, x2=11:20, x3=1:2)
b <- a[order(a$x3, a$x1, decreasing=TRUE),]
The closest thing I can think of is all.equal
, but its output is not helpful (to me, at least):
> all.equal(a,b)
[1] "Attributes: < Component 2: Mean relative difference: 0.9545455 >"
[2] "Component 1: Mean relative difference: 0.9545455"
[3] "Component 2: Mean relative difference: 0.3387097"
[4] "Component 3: Mean relative difference: 0.6666667"
I imagine there is some obvious way to do this that is alluding me. I'm looking for a general solution that would scale well to many variables and many observations (thus the above example is simply for demonstration).
Also: Ideally, such a function would also identify whether a
is a subset of b
(or vice versa).
I would explore the "compare" package:
library(compare)
compare(a, b, allowAll=TRUE)
# TRUE
# sorted
Here, it shows that it had to sort the data before it found the data to be the same.
Here's a slightly more complicated example, with factors coerced to character, rows reordered, and columns reordered:
a <- data.frame(x1=1:10, x2=11:20, x3=1:2, x4 = letters[1:10])
b <- with(a, a[order(x3, x1, decreasing=TRUE), ])
b$x4 <- as.character(b$x4)
b <- b[c(4, 1, 3, 2)]
Here's the result of compare
:
compare(a, b, allowAll=TRUE)
# TRUE
# reordered columns
# [x4] coerced from <character> to <factor>
# sorted