I have the following data frame containing medicine code according to the route of administration:
code <- data.frame(inn = c("ibuprofen", "ibuprofen", "ibuprofen", "fusidic acid", "fusidic acid"),
route = c("unknown", "unknown", "unknown", "oral", "topical"),
atc = c("R02AX02", "G02CC01", "M01AE01", "J01XC01", "D06AX01"))
inn route atc
1 ibuprofen unknown R02AX02
2 ibuprofen unknown G02CC01
3 ibuprofen unknown M01AE01
4 fusidic acid oral J01XC01
5 fusidic acid topical D06AX01
And another one containing patient treatment and event:
event <- data.frame(id = c(1, 1, 2),
inn = c("ibuprofen", "fusidic acid", "fusidic acid"),
route = c("unknown", "oral", "topical"),
event = c(TRUE, FALSE, TRUE))
id inn route event
1 1 ibuprofen unknown TRUE
2 1 fusidic acid oral FALSE
3 2 fusidic acid topical TRUE
I need to merge those data frames to get the following result:
inn route id event atc
1 fusidic acid oral 1 FALSE J01XC01
2 fusidic acid topical 2 TRUE D06AX01
3 ibuprofen unknown 1 TRUE NA
I do not get this result with a simple merge
:
merge(x = event,
y = code)
inn route id event atc
1 fusidic acid oral 1 FALSE J01XC01
2 fusidic acid topical 2 TRUE D06AX01
3 ibuprofen unknown 1 TRUE R02AX02
4 ibuprofen unknown 1 TRUE G02CC01
5 ibuprofen unknown 1 TRUE M01AE01
I thought of two solutions, but I did not manage to implement any:
code
data frame before the merge
to set atc
to NA
if there is different atc
for a group of inn
and route
(this seems more appropriate)merge
to set atc
to NA
if there is different atc
for a group of inn
, route
and id
How can I do it in base R? Is there another better way? I work in a restrictive environment where I only have access to base R.
Here's a straightforward way to accomplish option 2. Starting with the result of a simple merge:
mrg <- merge(x = event,
y = code)
inn route id event atc
1 fusidic acid oral 1 FALSE J01XC01
2 fusidic acid topical 2 TRUE D06AX01
3 ibuprofen unknown 1 TRUE R02AX02
4 ibuprofen unknown 1 TRUE G02CC01
5 ibuprofen unknown 1 TRUE M01AE01
We then check which rows are duplicated (dropping the atc
variable). We need to use duplicated twice because it actually finds duplicate rows, not rows that have duplicates. So, it would catch rows 4 and 5, but not 3 – to get that, we need to repeat duplicated
from the opposite direction. Read more here: Finding ALL duplicate rows, including “elements with smaller subscripts”:
mrg$atc <- ifelse(duplicated(mrg[,-5]) | duplicated(mrg[,-5], fromLast = T),
NA,
mrg$atc)
mrg
inn route id event atc
1 fusidic acid oral 1 FALSE J01XC01
2 fusidic acid topical 2 TRUE D06AX01
3 ibuprofen unknown 1 TRUE <NA>
4 ibuprofen unknown 1 TRUE <NA>
5 ibuprofen unknown 1 TRUE <NA>
If you want to get rid of the duplicate rows 4 and 5, just run duplicated
one more time to drop them:
mrg[!duplicated(mrg),]
inn route id event atc
1 fusidic acid oral 1 FALSE J01XC01
2 fusidic acid topical 2 TRUE D06AX01
3 ibuprofen unknown 1 TRUE <NA>