I am trying to process and update rows in a dataframe through a function, and return the dataframe to finish using it. When I try to return the dataframe to the original function call, it returns a series and not the expected column updates. A simple example is below:
df = pd.DataFrame(['adam', 'ed', 'dra','dave','sed','mike'], index =
['a', 'b', 'c', 'd', 'e', 'f'], columns=['A'])
def get_item(data):
comb=pd.DataFrame()
comb['Newfield'] = data #create new columns
comb['AnotherNewfield'] = 'y'
return pd.DataFrame(comb)
Caling a function using apply:
>>> newdf = df['A'].apply(get_item)
>>> newdf
a A Newfield AnotherNewfield
a adam st...
b A Newfield AnotherNewfield
e sed st...
c A Newfield AnotherNewfield
d dave st...
d A Newfield AnotherNewfield
d dave st...
e A Newfield AnotherNewfield
s NaN st...
f A Newfield AnotherNewfield
m NaN str(...
Name: A, dtype: object
>>> type(newdf)
<class 'pandas.core.series.Series'>
I assume that apply() is bad here, but am not quite sure how I 'should' be updating this dataframe via function otherwise.
Edit: I appologize but i seems I accidentally deleted the sample function on an edit. added it back here as I attempt a few other things I found in other posts.
Testing in a slightly different manner with individual variables - and returning multiple series variables -> seems to work so I will see if this is something I can do in my actual case and update.
def get_item(data):
value = data #create new columns
AnotherNewfield = 'y'
return pd.Series(value),pd.Series(AnotherNewfield)
df['B'], df['C'] = zip(*df['A'].apply(get_item))
You could use groupby
with apply
to get dataframe from apply
call, like this:
import pandas as pd
# add new column B for groupby - we need single group only to do the trick
df = pd.DataFrame(
{'A':['adam', 'ed', 'dra','dave','sed','mike'], 'B': [1,1,1,1,1,1]},
index=['a', 'b', 'c', 'd', 'e', 'f'])
def get_item(data):
# create empty dataframe to be returned
comb=pd.DataFrame(columns=['Newfield', 'AnotherNewfield'], data=None)
# append series data (or any data) to dataframe's columns
comb['Newfield'] = comb['Newfield'].append(data['A'], ignore_index=True)
comb['AnotherNewfield'] = 'y'
# return complete dataframe
return comb
# use column B for group to get tuple instead of dataframe
newdf = df.groupby('B').apply(get_item)
# after processing the dataframe newdf contains MultiIndex - simply remove the 0-level (index col B with value 1 gained from groupby operation)
newdf.droplevel(0)
Output:
Newfield AnotherNewfield
0 adam y
1 ed y
2 dra y
3 dave y
4 sed y
5 mike y