Search code examples
pythonpandasif-statementstartswith

Create custom column based on condition of how another column value starts with


I have the following dataframe: enter image description here Person Number Error Department Name Email Country
CZ 10054609 The identifier 11380151 is used by Veronika Fi... CZ:Supply Chain Pohořelice 1 Henkel Cosmeticos... verca.fialova.2001@gmail.com CZ 10054620 The identifier 11380126 is used by Radmila Val... CZ:Supply Chain Pohořelice 1 Henkel VAS (CZM63... rvalova1@seznam.cz CZ 10054728 The identifier 11805326 is used by Pavel Pecka... CZ:Supply Chain Pohořelice 3 Levis (CZM630.415... pavlias000@seznam.cz CZ 10054699 The identifier 11380232 is used by Sabina Love... CZ:Supply Chain Pohořelice 3 Marks and Spencer... s.loveckova@seznam.cz CZ 10054727 The identifier 11805358 is used by Tereza Holč... CZ:Supply Chain Pohořelice 3 Levis (CZM630.415... tholcapko@seznam.cz

I need to create a column named "Error Type" that follows the condition:

  • If the "Error" column starts with "The Identifier" put the value as "Duplicated"
  • If the column starts with "The data" put the value as "Transaction"

What would be the best way to solve it?


Solution

  • EDIT:

    If there is many different values create dictionary for mapping and set values in loop:

    df=pd.DataFrame({'Error':['The Identifier 1','The Identifier 3','The data dd','another data']})
    
    #add all possible values
    mapping = {'The Identifier': 'Duplicated','The data':'Transaction'}
    
    df['Error'] = df['Error'].str.strip()
    
    for k, v in mapping.items():
       df.loc[df['Error'].str.startswith(k), 'new'] = v
    print (df)
                  Error          new
    0  The Identifier 1   Duplicated
    1  The Identifier 3   Duplicated
    2       The data dd  Transaction
    3      another data          NaN