Search code examples
rdataframeoverlap

R collapse duplicate pairs (in any order) across dataframe columns and edit 3rd column?


I used rbind to join 2 dataframes, with a column denoting its source, resulting in

    from | to | source
1     A    B      X    
2     C    D      Y
3     B    A      Y
...

I would like to look for overlapping pairs, regardless of "order", combine those pairs, then edit the source column to something else, e.g. "Z".

In the above example, rows 1 and 3 would be flagged as overlapping, so they will be combined and modified.

So the desired output would look something like

    from | to | source
1     A    B      Z    
2     C    D      Y
...

How can this be done?


Solution

  • You can try the code below

    unique(
      transform(
        transform(
          df,
          from = pmin(from, to),
          to = pmax(from, to)
        ),
        source = ave(source, from, to, FUN = function(x) ifelse(length(x) > 1, "Z", x))
      )
    )
    

    which gives

      from to source
    1    A  B      Z
    2    C  D      Y