Search code examples
rlarge-dataone-to-one

Checking one-to one relationship R script


I'm looking for a good way to check one-to-one relationships in the table of specified tags. I have three types of tags: primary (P) and two secondary ones (A) and (B) in a table like this:

P1 A1 B1

P1 A1 B2

P2 A2 B2

For each primary tag (P) multiple types of secondary tags can exist, and I need to check if they have one-to-one relationship. That is for each P each type of tag A can be associated with only one type of tag B and vice versa. In a toy table above, for example, only the last row is good.

Now,for small table the check can be done quite easily: group the table by primary tag and then for each group do two more groupings: once by secondary tag A and once by B. In case of one-to-one relationship each sub-group would be of size one.

The issue is that this approach becomes extremely slow if I need to traverse huge tables (millions or ten millions of rows). What are good ways to speed up the process?


Solution

  • As noted by user2554330, you can group by all three. Then in a large table, add a count element to easily sort or filter:

    df <- df %>%
    group_by(col1, col2, col3) %>%
    summarise(count = n())