Search code examples
pythonpandasloopsdataframeminimum

How to count rows based on the variable position in a column in python


I have a data frame that looks like that : (the treatment has multiple possibily of character variable, I just simplified for the question)

ID              Position            Treatment
--20AxECvv-         0           A
--20AxECvv-         -1          A
--20AxECvv-         -2          A
--h9INKewQf-        0           A
--h9INKewQf-        -1          B
--h9INKewQf-        -2          B
--h9INKewQf-        -3          B
--h9INKewQf-        -4          B
--h9INKewQf-        -5          B
--h9INKewQf-        -6          B
--h9INKewQf-        -7          B
zZU7a@8jN           0           B
QUeSNEXmdB          0           C
QUeSNEXmdB          -1          C
qu72Ql@h79          0           C

I just want to keep the treatment of the minimum position for each ID with Python.

The result would be :

 ID              Position            Treatment
--20AxECvv-         -2          A
--h9INKewQf-        -7          B
zZU7a@8jN           0           B
QUeSNEXmdB          -1          C
qu72Ql@h79          0           C

I am a beginner, I think I have to use a groupby and filter by the minimum position but I have any idea how to write it. Thanks in advance :)


Solution

  • You can also sort your dataframe using df.sort_values and drop duplicates using df.drop_duplicates:

    In [795]: df.sort_values('Position').drop_duplicates('ID')
    Out[795]: 
                  ID  Position Treatment
    10  --h9INKewQf-        -7         B
    2    --20AxECvv-        -2         A
    13    QUeSNEXmdB        -1         C
    11     zZU7a@8jN         0         B
    14    qu72Ql@h79         0         C