I have two tables A
and B
, with the following 3 columns/variables:
id
: A unique patient identifier (char strings)cancer
: Binary variable for whether cancer was present (0
=no, 1
=yes)hiv
: Binary variable indicating whether HIV was present (0
=no, 1
=yes)The two tables may have different sets of id
's but they have the same 3 columns.
What I want to do is to merge them such that the merged table contains the following:
id
: The entire set of unique patient identifiers from both A
and B
(i.e. the union)cancer
: 1
if the patient had cancer=1
in either A
or B
. 0
otherwise, including if missing.hiv
: Same as cancer
except for hiv
I'm not sure what's the best way to go about doing this. Base R, tidyverse or data.table examples are all acceptable.
I don't know if you want to join
them. I would think, with the same columns, you may want to append them:
library(tidyverse)
A %>%
bind_rows(B) %>%
group_by(id) %>%
summarise(cancer = max(cancer, na.rm = T),
hiv = max(cancer, na.rm = T) %>%
ungroup()