Search code examples
rdataframearray-merge

Merging two dataframes - union of IDs, "ANY" over values


I have two tables A and B, with the following 3 columns/variables:

  1. id: A unique patient identifier (char strings)
  2. cancer: Binary variable for whether cancer was present (0=no, 1=yes)
  3. hiv: Binary variable indicating whether HIV was present (0=no, 1=yes)

The two tables may have different sets of id's but they have the same 3 columns.

What I want to do is to merge them such that the merged table contains the following:

  1. id: The entire set of unique patient identifiers from both A and B (i.e. the union)
  2. cancer: 1 if the patient had cancer=1 in either A or B. 0 otherwise, including if missing.
  3. hiv: Same as cancer except for hiv

I'm not sure what's the best way to go about doing this. Base R, tidyverse or data.table examples are all acceptable.


Solution

  • I don't know if you want to join them. I would think, with the same columns, you may want to append them:

    library(tidyverse)
    
    A %>%
    bind_rows(B) %>%
    group_by(id) %>%
    summarise(cancer = max(cancer, na.rm = T),
              hiv = max(cancer, na.rm = T) %>%
    ungroup()