Search code examples
rdataframecomparisondplyranti-join

No applicable method for 'anti_join' applied to an object of class "factor"


I want to Identify the rows present in dataframe1 which are not present in dataframe2 based on a particular column. I have used the below code to get the desired information.

diffId <- anti_join(dat$ID,datwe$ID)

Unfortunately, I have encountered with an error:

Error in UseMethod("anti_join") :
no applicable method for 'anti_join' applied to an object of class "factor"

I have checked the class of the desired column in both the dataframes and which turned out to be factor. Have also tried to separate the column into a separate variable in an assumption that it might solve the issue, but of no luck !

fac1 <- datwe$ID
fac2 <- dat$ID
diffId <- anti_join(fac2,fac1)

Could you please share your thoughts ?

Thanks


Solution

  • Almost all dplyr functions operate on tbls (depending on the context it can be data.frame, data.table, database connection and so on) so what you really want is something like this:

    > dat <- data.frame(ID=c(1, 3, 6, 4), x=runif(4))
    > datwe <- data.frame(ID=c(3, 5, 8), y=runif(3))
    > anti_join(dat, datwe, by='ID') %>% select(ID)
      ID
    1  4
    2  6
    3  1
    

    Note that ordering is clearly not preserved.

    If you use factors (unlike numerics in the example above) with different levels there is a conversion between factor and character involved.

    If you want to operate on vectors then you can use setdiff (available in both base and dplyr)

    > setdiff(dat$ID, datwe$ID)
    [1] 1 6 4