I'm looking for a good way to check one-to-one relationships in the table of specified tags. I have three types of tags: primary (P) and two secondary ones (A) and (B) in a table like this:
P1 A1 B1
P1 A1 B2
P2 A2 B2
For each primary tag (P) multiple types of secondary tags can exist, and I need to check if they have one-to-one relationship. That is for each P each type of tag A can be associated with only one type of tag B and vice versa. In a toy table above, for example, only the last row is good.
Now,for small table the check can be done quite easily: group the table by primary tag and then for each group do two more groupings: once by secondary tag A and once by B. In case of one-to-one relationship each sub-group would be of size one.
The issue is that this approach becomes extremely slow if I need to traverse huge tables (millions or ten millions of rows). What are good ways to speed up the process?
As noted by user2554330, you can group by all three. Then in a large table, add a count element to easily sort or filter:
df <- df %>%
group_by(col1, col2, col3) %>%
summarise(count = n())