Search code examples
python-3.xstringpandasapplypandas-apply

Create columns with .apply() Pandas with strings


I have a Dataframe df.

One of the columns is named Adress and contains a string.

I have created a function processing(string) which takes as argument a string a returns a part of this string.

I succeeded to apply the function to df and create a new column in df with:

df.loc[:, 'new_col_name`] = df.loc[:, 'Adress`].apply(processing)

I modified my function processing(string) in such a way it returns two strings. I would like the second string returned to be stored in another new column. To do so I tried to follow the steps given in : Create multiple pandas DataFrame columns from applying a function with multiple returns

Here is an example of my function processing(string):

def processing(string):
    #some processing
    return [A_string, B_string]

I also tried to return the two strings in a tuple.

Here are the different ways I tried to apply the function to my df :

df.loc[:, '1st_new_col'], df.loc[:, '2nd_new_col'] = df.loc[:, 'Adress'].apply(processing)
>>> ValueError: too many values to unpack (expected 2)

#or

df.loc[:, '1st_new_col'], df.loc[:, '2nd_new_col'] = df.loc[:, 'Adress'].astype(str).apply(processing)
>>> ValueError: too many values to unpack (expected 2)

#or

df.loc[:, ['1st_new_col', '2nd_new_col']] = df.loc[:, 'Adress'].apply(processing)
>>> KeyError: "None of [Index(['1st_new_col', '2nd_new_col'], dtype='object')] are in the [columns]"

#or

df.loc[:, ['1st_new_col', '2nd_new_col']] = df.loc[:, 'Adress'].apply(processing, axis=1)
>>> TypeError: processing() got an unexpected keyword argument 'axis'

#or

df.loc[:, ['1st_new_col', '2nd_new_col']] = df.apply(lambda x: processing(x['Adress'], axis=1)
>>> KeyError: "None of [Index(['1st_new_col', '2nd_new_col'], dtype='object')] are in the [columns]"

#or

df.loc[:, ['1st_new_col', '2nd_new_col']] = df.apply(lambda x: processing(x['Adress'].astype(str), axis=1)
>>> AttributeError: 'str' object has no attribute 'astype'
#This is the only Error I could understand

#or

df.loc[:, ['1st_new_col', '2nd_new_col']] = df.apply(lambda x: processing(x['Adress'])
>>> KeyError: 'Adress'

I think I am close, but I have no ideas about how to get it.


Solution

  • Try:

     df["Adress"].apply(process)
    

    Also, it's better to return a pd.Series in the apply function.

    Here one example:

    # build example dataframe
    df = pd.DataFrame(data={'Adress' : ['Word_1_1 Word_1_2','Word_2_1 Word_2_2','Word_3_1 Word_3_2','Word_4_1 Word_4_2']}) 
    print(df)
    #               Adress
    # 0  Word_1_1 Word_1_2
    # 1  Word_2_1 Word_2_2
    # 2  Word_3_1 Word_3_2
    # 3  Word_4_1 Word_4_2
    
    # Define your own function : here return two elements
    def process(my_str):
        l = my_str.split(" ")
        return pd.Series(l)
    
    # Apply the function and store the output in two new columns
    df[["new_col_1", "new_col_2"]] = df["Adress"].apply(process)
    print(df)
    #               Adress new_col_1 new_col_2
    # 0  Word_1_1 Word_1_2  Word_1_1  Word_1_2
    # 1  Word_2_1 Word_2_2  Word_2_1  Word_2_2
    # 2  Word_3_1 Word_3_2  Word_3_1  Word_3_2
    # 3  Word_4_1 Word_4_2  Word_4_1  Word_4_2