Search code examples
data-cleaningregex-group

To regularize every incidence of the term 'Heb' to "HEB" in pandas df


In a pandas df, trying to regularize every incidence of the term 'Heb' to "HEB"

In a pandas df, trying to regularize every incidence of the term 'Heb' to "HEB" using sidetable, I tried this and it worked. (df['Description'].str.contains('H-E-B', case=False, regex=False), 'HEB'), but then I tried this and it didn't work. (df['Description'].str.contains(['HEB', 'H-E-B', 'Heb online' ], case=False, regex=False), 'HEB'),

Error message reads:"AttributeError: 'list' object has no attribute 'upper'"

Would expect Input:

Description Amount
HEB 100.00
HEB#123 100.00
HEB online 100.00
H-E-B Gas 100.00
Heb Carwash 100.00

Output:

Description Amount
HEB 100.00
HEB 100.00
HEB 100.00
HEB 100.00
HEB 100.00

Solution

  • The str.contains() method expects a string and you are giving it a list of strings instead, that's why you are getting the AttributeError.

    To replace all descriptions that contain some variation of 'Heb' with 'HEB' you can create a list that contains all expected variations, then loop over the rows of your dataframe and compare the description with your variations:

    import pandas as pd
    
    # Sample DataFrame
    data = {'Description': ['HEB', 'HEB#123', 'HEB online', 'H-E-B Gas', 'Heb Carwash'],
            'Amount': [100.00, 100.00, 100.00, 100.00, 100.00]}
    
    df = pd.DataFrame(data)
    patterns = ['HEB', 'H-E-B', 'Heb' ]
    
    
    for index, row in df.iterrows():
        for pattern in patterns:
            if pattern in df.at[index, 'Description']:
                df.at[index, 'Description'] = 'HEB'
    
    print(df)