I have a data frame (1 millions of data) that looks like that : (the treatment has multiple possibily of character variable, I just simplified for the question)
ID Position Treatment
--20AxECvv- 0 A
--20AxECvv- -1 A
--20AxECvv- -2 A
--h9INKewQf- 0 A
--h9INKewQf- -1 B
zZU7a@8jN 0 B
QUeSNEXmdB 0 C
QUeSNEXmdB -1 C
qu72Ql@h79 0 C
I just want to keep the ID with exclusif treatment, in other word keep ID who was treated by only one treatment even if it was several times. After, I want to sum the number of ID for each treatment. The result would be :
ID Position Treatment
--20AxECvv- 0 A
--20AxECvv- -1 A
--20AxECvv- -2 A
zZU7a@8jN 0 B
QUeSNEXmdB 0 C
QUeSNEXmdB -1 C
qu72Ql@h79 0 C
And the sum :
A : 1
B : 1
C : 2
I have any ida how to resolve this, maybe with a loop within a loop but I am a beginner with R.
We can use uniqueN
to check the number of unique 'Treatment' for each 'ID' and subset based on that
library(data.table)
dt <- setDT(df1)[, if(uniqueN(Treatment)==1) .SD, ID]
dt
# ID Position Treatment
#1: --20AxECvv- 0 A
#2: --20AxECvv- -1 A
#3: --20AxECvv- -2 A
#4: zZU7a@8jN 0 B
#5: QUeSNEXmdB 0 C
#6: QUeSNEXmdB -1 C
#7: qu72Ql@h79 0 C
and we find the unique number of 'ID' per 'Treatment
dt[, .(Count = uniqueN(ID)), Treatment]
# Treatment Count
#1: A 1
#2: B 1
#3: C 2