Search code examples
python-3.xpandascomparisonrows

Compare and drop consecutive rows in a dataframe Python


I need to compare two consecutive rows in a dataframe, for example:

df:

time        door       name
00:01:10    in         alex
00:01:10    in         alex
02:01:10    out        alex
03:01:10    in         alex
04:01:10    out        alex
04:01:10    out        alex

I need to remove duplicates if door is in (or out) in two consecutive rows.

This is a part of my code:

import pandas as pd

file_name='test.xlsx'

df = pd.read_excel(file_name, header=0, index= False)
mydf = df.sort_values(by='time')
for i in range (len(mydf)):
    if (mydf[['door']] != mydf[['door']].shift(-1)).any(axis=1):
        print('ok')
    else:
        print ('nok')

I got this error:

if ((mydf[['Door Name']] != mydf[['Door Name']].shift(-1).any(axis=1))):

  File "C:\Users\khou\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py", line 1478, in __nonzero__
    .format(self.__class__.__name__))

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I don't know how to fix it, any help would be really appreciated.


Solution

  • You can use the following to get the index first, then delete based on index:

    Code

    import pandas as pd
    
    mydf=pd.DataFrame({'time':['00:01:10','00:01:10','02:01:10','03:01:10','04:01:10','04:01:10'],
                    'door':['in','in','out','in','out','out'],
                   'name':['alex','alex','alex','alex','alex','alex']})
    
    idx=[]
    for i in range (0,len(mydf)):
        if i == 0:
            print ('index '+str(i)+' ok')
        elif mydf['door'][i] != mydf['door'][i-1]:
            print('index '+str(i)+' ok')
        else:
            print ('index '+str(i)+' nok')
            idx.append(i)
    
    mydf.drop(mydf.index[[idx]],inplace=True)
    
    print('\n',mydf)
    

    Output

    index 0 ok
    index 1 nok
    index 2 ok
    index 3 ok
    index 4 ok
    index 5 nok
    
            time door  name
    0  00:01:10   in  alex
    2  02:01:10  out  alex
    3  03:01:10   in  alex
    4  04:01:10  out  alex