Why does order() in R generate NAs when passing in a subsetted dataframe?

Having a little trouble understanding what is going on here, it appears to me that both methods for ordering the data frame below are equivalent.

Our dataframe,

cols <- c("chr","id","value")
df <-   data.frame(c(1:5),c("ENSG1","ENSG2","ENSG3","ENSG4","ENSG5"),runif(5,5.0,10.0))
names(df) <- cols
df <- df[sample(nrow(df)),]
df

chr    id    value
5      ENSG5 8.913645
2      ENSG2 6.117744
4      ENSG4 8.558403
3      ENSG3 9.625546
1      ENSG1 6.105577

Now, method 1:

df[order(df[,c("chr","id")]),]

chr    id    value
1      ENSG1 6.105577
2      ENSG2 6.117744
3      ENSG3 9.625546
4      ENSG4 8.558403
5      ENSG5 8.913645
NA    <NA>       NA
NA    <NA>       NA
NA    <NA>       NA
NA    <NA>       NA
NA    <NA>       NA

Which throws in NAs for some curious reason, while passing in df columns to order() as in,

method 2:

df[order(df$chr,df$id),]

chr    id    value
1      ENSG1 6.105577
2      ENSG2 6.117744
3      ENSG3 9.625546
4      ENSG4 8.558403
5      ENSG5 8.913645

alternatively does not.

Can someone explain why method 1 and method 2 are not interchangeable?

Solution

When we look at ?order, it's first arguments are documented as:

a sequence of numeric, complex, character or logical vectors, all of the same length, or a classed R object.

Nothing there really suggests that it would work on a data frame. A "classed R object" is a bit vague, and suggests that a data frame won't throw an error, but it certainly doesn't say "or a data frame".

The Description says:

See the examples for how to use these functions to sort data frames, etc.

When you call order or a data frame, you can see what happens:

order(data.frame(a = 1:5, b = 5:1))
# [1]  1 10  2  9  3  8  4  7  5  6

It looks like it coerces the data frame to a vector, and orders it. Not generally very useful. This is why when you run df[order(df[,c("chr","id")]),] you get the NA rows. Your input data frame had 2 columns hence the order() output had twice as many rows as the data frame.

You have already found the correct way to order a data frame, which is to give actual vectors to order. The vectors can be individual columns of your data frame or they can be other vectors of the correct length.