I am given a large data table with two indicators ind1
and ind2
with possible repetitions. E.g.
set.seed(1)
ind1 <- sample(1:3,1000, replace=TRUE )
ind2 <- c("a","b","c")[ind1]
dt <- data.table(ind1=ind1, ind2=ind2)
I would like now to check, whether These two indicators group the data the same way, i.e.
two rows have the same indicator ind1
if and only if they also the same indicator ind2
. In the above example, this would be the case by construction.
You can simply group by ind2
and count distinct ind1
or vice-versa. If any count > 1 then they don't group the data in the same way. Here's a way with base R -
any(with(dt, ave(ind1, ind2, FUN = function(x) length(unique(x)))) > 1)
[1] FALSE # means ind1 and ind2 group the data in same way
Alternatively, you can check if all count == 1 using all
if that's easier to interpret -
all(with(dt, ave(ind1, ind2, FUN = function(x) length(unique(x)))) == 1)
[1] TRUE # means ind1 and ind2 group the data in same way