Search code examples
rdata.tableunique

Unique doesn't use keys as default anymore


I mainly use Rstudio in Mac. Recently I had to start using Windows. However, I found out that unique() does not provide unique rows in data.table based on the key. Here is an example:

a=c(2,3,3,3,3,5,6,7)
b=c("a","a","f","g","a","d","t","l")
e=data.table(a,b)
setkey(e, a)
key(e) # this works fine
unique(e) 

unique() only removes the duplicate for the entire line (line 5). The exact same code runs fine on my mac.


Solution

  • That's because you have different data.table versions on both. On Mac you have a <1.9.8 version (which still uses keys as default), while on Windows you have a newer version (which doesn't).

    As stated in ?unique (in data.table V1.9.8+):

    By default all columns are being used. This was changed recently for consistency to data.frame methods. In version < 1.9.8 default was key(x)

    Meaning, from now on, you need to explicitly specify the by variable even if you already have keys set, otherwise it will just use all the columns.

    For your specific example, this works

    unique(e, by = "a")
    #    a b
    # 1: 2 a
    # 2: 3 a
    # 3: 5 d
    # 4: 6 t
    # 5: 7 l
    

    Or as @Frank mentioned in comments, you can also specify the the key in the by param using unique(a, by = key(a)).