Search code examples
pythonpandasif-statementconcatenation

Concatenate on specific condition python


  • EDITED

I want to write an If loop with conditions on cooncatenating strings. i.e. If cell A1 contains a specific format of text, then only do you concatenate, else leave as is.

example: If bill number looks like: CM2/0000/, then concatenate this string with the date column (month - year), else leave the bill number as it is.

Sample Data


Solution

  • You can create function which does what you need and use df.apply() to execute it on all rows.

    I use example data from @Boomer answer.

    EDIT: you didn't show what you really have in dataframe and it seems you have datetime in bill_date but I used strings. I had to convert strings to datetime to show how to work with this. And now it needs .strftime('%m-%y') or sometimes .dt.strftime('%m-%y') instead of .str[3:].str.replace('/','-'). Because pandas uses different formats to display dateitm for different countries so I couldn't use str(x) for this because it gives me 2019-09-15 00:00:00 instead of yours 15/09/19

    import pandas as pd
    
    df = pd.DataFrame({
        'bill_number': ['CM2/0000/', 'CM2/0000', 'CM3/0000/', 'CM3/0000'],
        'bill_date': ['15/09/19', '15/09/19', '15/09/19', '15/09/19']
    })
    df['bill_date'] = pd.to_datetime(df['bill_date'])
    
    def convert(row):
        if row['bill_number'].endswith('/'):
            #return row['bill_number'] + row['bill_date'].str[3:].replace('/','-')
            return row['bill_number'] + row['bill_date'].strftime('%m-%y')
        else:
            return row['bill_number']
    
    df['bill_number'] = df.apply(convert, axis=1)
    
    print(df)
    

    Result:

          bill_number bill_date
    0  CM2/0000/09-19  15/09/19
    1        CM2/0000  15/09/19
    2  CM3/0000/09-19  15/09/19
    3        CM3/0000  15/09/19
    

    Second idea is to create mask

     mask = df['bill_number'].str.endswith('/')
    

    and later use it for all values

     #df.loc[mask,'bill_number'] = df[mask]['bill_number'] + df[mask]['bill_date'].str[3:].str.replace('/','-')
     df.loc[mask,'bill_number'] = df[mask]['bill_number'] + df[mask]['bill_date'].dt.strftime('%m-%y')
    

    or

     #df.loc[mask,'bill_number'] = df.loc[mask,'bill_number'] + df.loc[mask,'bill_date'].str[3:].str.replace('/','-')
     df.loc[mask,'bill_number'] = df.loc[mask,'bill_number'] + df.loc[mask,'bill_date'].dt.strftime('%m-%y')
    

    Left side needs .loc[mask,'bill_number'] instead of `[mask]['bill_number'] to correctly assing values - but right side doesn't need it.

    import pandas as pd
    
    df = pd.DataFrame({
        'bill_number': ['CM2/0000/', 'CM2/0000', 'CM3/0000/', 'CM3/0000'],
        'bill_date': ['15/09/19', '15/09/19', '15/09/19', '15/09/19']
    })
    df['bill_date'] = pd.to_datetime(df['bill_date'])
    
    mask = df['bill_number'].str.endswith('/')
    
    #df.loc[mask,'bill_number'] = df[mask]['bill_number'] + df[mask]['bill_date'].str[3:].str.replace('/','-')
    # or
    #df.loc[mask,'bill_number'] = df.loc[mask,'bill_number'] + df.loc[mask,'bill_date'].str[3:].str.replace('/','-')
    
    df.loc[mask,'bill_number'] = df[mask]['bill_number'] + df[mask]['bill_date'].dt.strftime('%m-%y')
    #or
    #df.loc[mask,'bill_number'] = df.loc[mask,'bill_number'] + df.loc[mask,'bill_date'].dt.strftime('%m-%y')
    
    print(df)
    

    Third idea is to use numpy.where()

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame({
        'bill_number': ['CM2/0000/', 'CM2/0000', 'CM3/0000/', 'CM3/0000'],
        'bill_date': ['15/09/19', '15/09/19', '15/09/19', '15/09/19']
    })
    df['bill_date'] = pd.to_datetime(df['bill_date'])
    
    df['bill_number'] = np.where(
                           df['bill_number'].str.endswith('/'), 
                           #df['bill_number'] + df['bill_date'].str[3:].str.replace('/','-'), 
                           df['bill_number'] + df['bill_date'].dt.strftime('%m-%y'), 
                           df['bill_number'])
    
    print(df)