I have a dataframe xmpl
with the grouping or cluster variable ID
. I want to add a new column a1_momo
which is filled with one specific value from column a1
for each row per ID
based on the following conditions:
'BM' == 1 & 'W2' == 0 & 'dt' == 1
that lead to one clear identified single match per cluster, except for no such row is existing.
xmpl <- data.frame(ID = c(1,1,1,1,1,2,2,2,2,3),
BM = c(0,0,1,1,1,0,0,1,1,1),
W2 = c(0,0,0,0,1,0,0,0,1,0),
dt = c(1,2,1,2,1,1,2,2,1,1),
a1 = c(3,2,1,2,2,3,2,2,3,NA))
> xmpl
ID BM W2 dt a1
1 1 0 0 1 3
2 1 0 0 2 2
3 1 1 0 1 1
4 1 1 0 2 2
5 1 1 1 1 2
6 2 0 0 1 3
7 2 0 0 2 2
8 2 1 0 2 2
9 2 1 1 1 3
10 3 1 0 1 NA
Solely xmpl[3,]
fits the conditions for 'ID' == 1
so I want to fill the value of xmpl$a1[3]
to each row of the cluster into the new column a1_momo
.
For 'ID' == 2
there is no match to conditions so therefore a1_momo
should be filled with NA
for the rows clustered in 'ID' == 2
.
Lastly, 'ID' == 3
contains a row matching all conditions but has no value for a1
.
The result I wish to get:
> xmpl
ID BM W2 dt a1 a1.momo
1 1 0 0 1 3 1
2 1 0 0 2 2 1
3 1 1 0 1 1 1
4 1 1 0 2 2 1
5 1 1 1 1 2 1
6 2 0 0 1 3 NA
7 2 0 0 2 2 NA
8 2 1 0 2 2 NA
9 2 1 1 1 3 NA
10 3 1 0 1 NA NA
My limited knowledge in prorgramming with R led me to the following conclusion:
I could produce a new dataframe that is grouped for my cluster variable so I have only one row per ID
. The only function I know to do nearly the thing I want, is aggregate
but it needs a summary function for specification. Instead of that I need someting to define my conditions by which the new dataframe can be filled with the one specific value (or NA) per cluster, e.g. a ifelse
-specification.
So there would be something like:
xmpl.2<-not.aggregate(xmpl[,'a1'],list(xmpl$ID),ifelse(...))
> xmpl.2
Group.1 a1
1 1 1
2 2 NA
3 3 NA
Afterwards I imagined something like merge
-ing it together by
the grouping variable ID
with the suffix .momo
. (because of course the real data contains more than one "a1
")
xmpl<-merge(xmpl,xmpl.2,by=1,suffixes=c("",".momo"))
My questions therefore are:
[package::function]
do I need to use then instead of aggregate
?ifelse
-functions?Days of research ended up with nothing so I'm typing this example. I guess there's a more elegant solution I can't see. Very grateful for help.
I bet there's a concise one-liner with base R, but here's a simple dplyr approach. I join the data to a version that extracts matching rows, one per ID/a1.mono. I'm not sure how you want to deal with potential situations where there are multiple matching rows within an ID, with different a1 values.
library(dplyr)
xmpl |>
left_join(
xmpl |>
filter(BM == 1 & W2 == 0 & dt == 1 & !is.na(a1)) |>
mutate(a1.mono = a1) |>
distinct(ID, a1.mono)
)