Search code examples
pythonpandaslambdaapply

Python pandas dataframe using apply to call function with list output


I am trying to apply a function to each element of a column of a pandas data frame. This function should return a list of strings. I would like to have each string in the list become its own column. Here is what I have been working with:

def parse_config(string):
    out = []
    pos = list()
    for x in re.finditer(pattern='\.',string=str(string)):
        pos.append(x.start())
    out.append(str(string)[0:pos[-2]])
    out.append(str(string)[pos[-2]+2:pos[-1]-1])
    out.append(str(string)[pos[-1]+1:][0:-1])
    out.append(str(string)[pos[-1]+1:][-1])
    return out

This function, given a string like 'abc.(e).ghi' will return ['abc','e','gh','i'].

I would like each of these list members to be placed in a column of the data frame.

I have tried

df[['a','b','c','d']]=df.apply(lambda x: parse_config(x['configuration']),axis=1)

with the hope new columns 'a','b','c','d' would be populated with the output of the function. There error I get is: IndexError: list index out of range

Can someone help me understand what is wrong? I have done essentially the same thing with a function that outputs one scalar (directing output to new column) and that works fine.


Solution

  • The attempt you made with df.apply() was mostly correct, but you need to use result_type='expand' in the apply() method to directly expand the list to columns:

    import pandas as pd
    import re
    
    data = {'configuration': ['abc.(e).ghi', 'test.(m).example', 'sample.(d).demo', 'failtest']}
    df = pd.DataFrame(data)
    
    def parse_config(string):
        try:
            pos = [x.start() for x in re.finditer(pattern='\.', string=str(string))]
            if len(pos) < 2:
                return [None, None, None, None]  
            out = []
            out.append(str(string)[0:pos[-2]])  
            out.append(str(string)[pos[-2]+2:pos[-1]-1])  
            out.append(str(string)[pos[-1]+1:][0:-1])  
            out.append(str(string)[pos[-1]+1:][-1])  
            return out
        except IndexError:  
            return [None, None, None, None]  
    
    df[['a', 'b', 'c', 'd']] = df.apply(lambda x: parse_config(x['configuration']), axis=1, result_type='expand')
    
    print(df)
    

    which gives

          configuration       a     b       c     d
    0       abc.(e).ghi     abc     e      gh     i
    1  test.(m).example    test     m  exampl     e
    2   sample.(d).demo  sample     d     dem     o
    3          failtest    None  None    None  None