Search code examples
pythonpandaslambdajupyter-notebook

Cleaning up multiple functions and lambdas in Jupyter Notebook


My company tracks rejection issues in a 3rd party system. Any given ticket can have multiple reasons for rejection. My coworker exports the list of rejected tickets to an Excel file to ultimately use in data visualization.

I created a Jupyter Notebook file that will split out the reasons into individual columns which are true or false. There are currently 10 possible reasons, so I have 10 separate functions that check if each value is true, and run 10 separate lambdas. It works perfectly, but it is not very clean or maintainable.

I am struggling trying to find the right way (or even just a way that works) to combine all those functions and lambdas into a cleaner set of code.

I have a series of 10 functions, one for each reason, that are almost identical:

def reason_one (x):
    
    value = 0
    
    if 'reason_one' in x:
        value = 1
    else:
        pass
        
    return value
def reason_two (x):
    
    value = 0
    
    if 'reason_two' in x:
        value = 1
    else:
        pass
        
    return value

and so on, for all 10 reasons we currently use.

Then, I run 10 nearly identical lambdas, one after the other:

df['Reason One'] = df['Labels'].map(lambda x: reason_one(x))
df['Reason Two'] = df['Labels'].map(lambda x: verification(x))

Is there a way to clean this up? Ideally, I would like to create a dictionary that has all the reason codes and the columns they should be named, then loop through the Labels column on the dataframe for each possible value, adding a column each time.

I have my dictionary set up:

error_list = {
    'reason_one': 'Reason One',
    'reason_two': 'Reason Two',
    'reason_three': 'Reason Three',
    'reason_four': 'Reason Four'
}

I like this because my coworker would be able to just change that list and run the notebook and have everything work. For example, he might add a new reason; or edit the column name for a given reason code to be more clear.

My idea was to create a function that takes in a dictionary and a column, iterates over the dictionary keys, appends either 0 or 1 to and empty list, then use that list to create a new column.

I got this far:

def breakout_columns (errors, column):
    
    column_values = []
    
    for key in errors:
        
        if key in column:
            value = 1
        else:
            value = 0
        
        column_values.append(value)
    
        print(column_values)

That does indeed produce a list with 10 values when I run it, however they are all 0s even when some of them should be true. I was looking for resources on iterating over dataframe rows, and I am not seeing anything remotely like what I am trying to do.

Beyond this piece not working, I am concerned my approach is inherently flawed and either (a) I should be doing something completely different to try to clean things up; or (b) what I am trying to do is not possible or does not make sense, so I need to just stick with 10 functions and 10 lambdas.

Any guidance would be greatly appreciated!


Solution

  • You can loop over your error_list and create the new series by comparing the given columns to your reasons (and cast to an int if you want 0 or 1 instead of False and True):

    import pandas as pd
    
    error_list = {
        "reason_one": "Reason One",
        "reason_two": "Reason Two",
        "reason_three": "Reason Three",
        "reason_four": "Reason Four",
    }
    
    df = pd.DataFrame(
        {
            "Labels": [
                "reason_two",
                "reason_two",
                "reason_one",
                "cat",
                "reason_four",
                "many",
                "sandwich",
            ]
        }
    )
    
    for reason, column_name in error_list.items():
        df[column_name] = (df["Labels"] == reason).astype(int)
    
    print(df)
    

    prints out

            Labels  Reason One  Reason Two  Reason Three  Reason Four
    0   reason_two           0           1             0            0
    1   reason_two           0           1             0            0
    2   reason_one           1           0             0            0
    3          cat           0           0             0            0
    4  reason_four           0           0             0            1
    5         many           0           0             0            0
    6     sandwich           0           0             0            0