Search code examples
runiquedata.table

Unexpected result using unique inside a data.table


Given a data.table (vith version 1.9.5)

TEST <- data.table(1:20,rep(1:5,each=4, times=1))

If I run this:

TEST[unique(V2)]

I get this result:

   V1 V2
1:  1  1
2:  2  1
3:  3  1
4:  4  1
5:  5  2

Is it really the intended beahaviour or a bug? Or I'm just not using it properly?

I was reading the "R book" and in an example they use TEST[unique(Vegetation),] and say it's intended to select a subset of rows unique for the vegetation.

I expected to get something like

   V1  V2
1:  1   1
2:  5   2
3:  9   3
4:  13  4
5:  16  5

Though I understand that would need to specify an aggregation criteria.


Solution

  • TEST[,unique(V2)] gives [1] 1 2 3 4 5. Since TEST[1:5] is supposed to give you the first 5 rows and that's what you get, there is no bug.

    To get your expected result, you can do this:

    TEST[!duplicated(V2)]
    #   V1 V2
    #1:  1  1
    #2:  5  2
    #3:  9  3
    #4: 13  4
    #5: 17  5
    

    or this:

    TEST[, V1[1], by = V2]
    #   V2 V1
    #1:  1  1
    #2:  2  5
    #3:  3  9
    #4:  4 13
    #5:  5 17
    

    or as @Arun reminds me there is now a data.table method for unique:

    unique(TEST, by="V2")
    #   V1 V2
    #1:  1  1
    #2:  5  2
    #3:  9  3
    #4: 13  4
    #5: 17  5