Search code examples
rdata.tablegroupingindicator

Check whether two indicators are the same


I am given a large data table with two indicators ind1 and ind2 with possible repetitions. E.g.

 set.seed(1)
 ind1 <- sample(1:3,1000, replace=TRUE )
 ind2 <- c("a","b","c")[ind1]

 dt <- data.table(ind1=ind1, ind2=ind2)

I would like now to check, whether These two indicators group the data the same way, i.e.

two rows have the same indicator ind1 if and only if they also the same indicator ind2. In the above example, this would be the case by construction.


Solution

  • You can simply group by ind2 and count distinct ind1 or vice-versa. If any count > 1 then they don't group the data in the same way. Here's a way with base R -

    any(with(dt, ave(ind1, ind2, FUN = function(x) length(unique(x)))) > 1)
    
    [1] FALSE # means ind1 and ind2 group the data in same way
    

    Alternatively, you can check if all count == 1 using all if that's easier to interpret -

    all(with(dt, ave(ind1, ind2, FUN = function(x) length(unique(x)))) == 1)
    
    [1] TRUE # means ind1 and ind2 group the data in same way