python pandas dataframe error-handling row

Detecting erroneous consecutive rows without change in pandas dataframe

Let's say that I have the following DataFrame:

# Import pandas library
import pandas as pd
  
# initialize list of lists
data = [['tom', 10], ['tom', 10], ['sam', 23], ['sam', 23], ['sam', 23], ['alice', 23], ['alice', 30], ['alice', 30]]
  
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['Name', 'Age'])
  
# print dataframe.
df

    Name  Age
0    tom   10
1    tom   10
2    sam   23
3    sam   23
4    sam   23
5  alice   23
6  alice   30
7  alice   30

A data error occurs in the data where the 'Age' value does not change with a new consecutive 'Name' entry. In other words, row 5 contains a data entry error. How could I go about detecting and printing the rows in which this occurs? Thank you.

Solution

You can compare successive values:

# are the successive Age identical?
m1 = df['Age'].diff().eq(0)
# are the successive Name different?
m2 = df['Name'].ne(df['Name'].shift())

# show the rows where both conditions are True
df[m1&m2]

output:

    Name  Age
5  alice   23

To assign a value:

df.loc[m1&m2, 'comment'] = 'ERROR'

    Name  Age comment
0    tom   10     NaN
1    tom   10     NaN
2    sam   23     NaN
3    sam   23     NaN
4    sam   23     NaN
5  alice   23   ERROR
6  alice   30     NaN
7  alice   30     NaN

To map two values:

import numpy as np
df['error'] = np.where(m1&m2, 'yes', 'no')

    Name  Age error
0    tom   10    no
1    tom   10    no
2    sam   23    no
3    sam   23    no
4    sam   23    no
5  alice   23   yes
6  alice   30    no
7  alice   30    no