I am trying to apply a function to each element of a column of a pandas data frame. This function should return a list of strings. I would like to have each string in the list become its own column. Here is what I have been working with:
def parse_config(string):
out = []
pos = list()
for x in re.finditer(pattern='\.',string=str(string)):
pos.append(x.start())
out.append(str(string)[0:pos[-2]])
out.append(str(string)[pos[-2]+2:pos[-1]-1])
out.append(str(string)[pos[-1]+1:][0:-1])
out.append(str(string)[pos[-1]+1:][-1])
return out
This function, given a string like 'abc.(e).ghi' will return ['abc','e','gh','i'].
I would like each of these list members to be placed in a column of the data frame.
I have tried
df[['a','b','c','d']]=df.apply(lambda x: parse_config(x['configuration']),axis=1)
with the hope new columns 'a','b','c','d'
would be populated with the output of the function. There error I get is:
IndexError: list index out of range
Can someone help me understand what is wrong? I have done essentially the same thing with a function that outputs one scalar (directing output to new column) and that works fine.
The attempt you made with df.apply()
was mostly correct, but you need to use result_type='expand'
in the apply()
method to directly expand the list to columns:
import pandas as pd
import re
data = {'configuration': ['abc.(e).ghi', 'test.(m).example', 'sample.(d).demo', 'failtest']}
df = pd.DataFrame(data)
def parse_config(string):
try:
pos = [x.start() for x in re.finditer(pattern='\.', string=str(string))]
if len(pos) < 2:
return [None, None, None, None]
out = []
out.append(str(string)[0:pos[-2]])
out.append(str(string)[pos[-2]+2:pos[-1]-1])
out.append(str(string)[pos[-1]+1:][0:-1])
out.append(str(string)[pos[-1]+1:][-1])
return out
except IndexError:
return [None, None, None, None]
df[['a', 'b', 'c', 'd']] = df.apply(lambda x: parse_config(x['configuration']), axis=1, result_type='expand')
print(df)
which gives
configuration a b c d
0 abc.(e).ghi abc e gh i
1 test.(m).example test m exampl e
2 sample.(d).demo sample d dem o
3 failtest None None None None