Search code examples
rif-statementcountconditional-statementscriteria

How to select /count rows in a column based on multiple conditions


I have a data frame (1 millions of data) that looks like that : (the treatment has multiple possibily of character variable, I just simplified for the question)

ID              Position            Treatment
--20AxECvv-         0           A
--20AxECvv-         -1          A
--20AxECvv-         -2          A
--h9INKewQf-        0           A
--h9INKewQf-        -1          B
zZU7a@8jN           0           B
QUeSNEXmdB          0           C
QUeSNEXmdB          -1          C
qu72Ql@h79          0           C

I just want to keep the ID with exclusif treatment, in other word keep ID who was treated by only one treatment even if it was several times. After, I want to sum the number of ID for each treatment. The result would be :

ID              Position            Treatment
--20AxECvv-         0           A
--20AxECvv-         -1          A
--20AxECvv-         -2          A
zZU7a@8jN           0           B
QUeSNEXmdB          0           C
QUeSNEXmdB          -1          C   
qu72Ql@h79          0           C

And the sum :
A : 1 
B : 1
C : 2

I have any ida how to resolve this, maybe with a loop within a loop but I am a beginner with R.


Solution

  • We can use uniqueN to check the number of unique 'Treatment' for each 'ID' and subset based on that

    library(data.table)
    dt <- setDT(df1)[, if(uniqueN(Treatment)==1) .SD, ID]
    dt
    #            ID Position Treatment
    #1: --20AxECvv-        0         A
    #2: --20AxECvv-       -1         A
    #3: --20AxECvv-       -2         A
    #4:   zZU7a@8jN        0         B
    #5:  QUeSNEXmdB        0         C
    #6:  QUeSNEXmdB       -1         C
    #7:  qu72Ql@h79        0         C
    

    and we find the unique number of 'ID' per 'Treatment

    dt[, .(Count = uniqueN(ID)), Treatment]
    #    Treatment Count
    #1:         A     1
    #2:         B     1
    #3:         C     2