Search code examples
rsortingcomparisondataframe

Test whether a dataframe is a sorted version of another dataframe


Is it feasible to test whether some dataframe is simply a sorted version of another dataframe? For example, if I have two dataframes a and b, is there some way to easily determine whether a is simply a reordered version of b (or vice versa)?

Here's a trivial example:

a <- data.frame(x1=1:10, x2=11:20, x3=1:2)
b <- a[order(a$x3, a$x1, decreasing=TRUE),]

The closest thing I can think of is all.equal, but its output is not helpful (to me, at least):

> all.equal(a,b)
[1] "Attributes: < Component 2: Mean relative difference: 0.9545455 >"
[2] "Component 1: Mean relative difference: 0.9545455"                
[3] "Component 2: Mean relative difference: 0.3387097"                
[4] "Component 3: Mean relative difference: 0.6666667"

I imagine there is some obvious way to do this that is alluding me. I'm looking for a general solution that would scale well to many variables and many observations (thus the above example is simply for demonstration).

Also: Ideally, such a function would also identify whether a is a subset of b (or vice versa).


Solution

  • I would explore the "compare" package:

    library(compare)
    compare(a, b, allowAll=TRUE)
    # TRUE
    #   sorted
    

    Here, it shows that it had to sort the data before it found the data to be the same.

    Here's a slightly more complicated example, with factors coerced to character, rows reordered, and columns reordered:

    a <- data.frame(x1=1:10, x2=11:20, x3=1:2, x4 = letters[1:10])
    b <- with(a, a[order(x3, x1, decreasing=TRUE), ])
    b$x4 <- as.character(b$x4)
    b <- b[c(4, 1, 3, 2)]
    

    Here's the result of compare:

    compare(a, b, allowAll=TRUE)
    # TRUE
    #   reordered columns
    #   [x4] coerced from <character> to <factor>
    #   sorted