Search code examples
pythonpandasperformance

Speed up Pandas: find all columns which fullfill set of conditions


I have data represented using pandas DataFrame, which for example looks as follows:

| id | entity | name | value | location

where id is an integer value, entity is an integer , name is a string, value is an integer, and location is a string (for example US, CA, UK etc).

Now, I want to add a new column to this data frame, column "flag", where values are assigned as follows:

for d in df.iterrows():

    if d.entity == 10 and d.value != 1000 and d.location == CA:
        d.flag = "A" 
    elif d.entity != 10 and d.entity != 0 and d.value == 1000 and d.location == US:
        d.flag = "C"
    elif d.entity == 0 and d.value == 1000 and d.location == US"
        d.flag = "B"
    else:
        print("Different case")

Is there a way to speed this up and use some built in functions instead of the for loop?


Solution

  • Use np.select which you pass a list of conditions, based on those conditions you give it choices and you can specify a default value when none of the conditions is met.

    conditions = [
        (d.entity == 10) & (d.value != 1000) & (d.location == 'CA'),
        (d.entity != 10) & (d.entity != 0) & (d.value == 1000) & (d.location == 'US'),
        (d.entity == 0) & (d.value == 1000) & (d.location == 'US')
    ]
    
    choices = ["A", "C", "B"]
    
    df['flag'] = np.select(conditions, choices, default="Different case")