Search code examples
rduplicatesffffbase

duplicated function fails for ff date vectors


Hi I am trying to remove duplicates from a ff vector that contains dates using the duplicated function of the ffbase package and the following code:

v1 <- c("24-Mar-94", "24-Mar-94", "27-Mar-94", "28-Jun-1986", "29-Jul-1988", "28-Jun-1986", "15-Jan-1999", "13-Jan-1999")
v1.d <- as.Date(v1, format="%d-%b-%y")
v1.ff <- as.ff(v1.d)
v2 <- v1.ff[!duplicated(v1.ff)]

However I get the following error:

Error in UseMethod("as.hi") : 
  no applicable method for 'as.hi' applied to an object of class "Date"

Is there any way around this problem without having to coerce the v2 vector to a ram object first?


Solution

  • Try this:

    library(ff)
    v1 <- c("24-Mar-94", "24-Mar-94", "27-Mar-94", "28-Jun-1986", "29-Jul-1988", "28-Jun-1986", "15-Jan-1999", "13-Jan-1999")
    v1.d <- as.Date(v1, format="%d-%b-%y")
    v1.ff <- as.ff(v1.d)
    v2 <- v1.ff[ !duplicated(v1.ff[,])  ]
    

    Output:

    > v1.d
    [1] "1994-03-24" "1994-03-24" "1994-03-27" "2019-06-28" "2019-07-29" "2019-06-28" "2019-01-15" "2019-01-13"
    > v2
    [1] "1994-03-24" "1994-03-27" "2019-06-28" "2019-07-29" "2019-01-15" "2019-01-13"
    

    And it's deduped.

    ff objects need to be particularly sliced/subset in order to appropriately use functions on them. One of the ways is the above using [,] or [] (for this one since it's a vector) in order to create a vector with all of the elements and then use duplicated on it.