I am currently working with a dataframe looking at Kentucky oil wells with pandas
and want to create a pivot table
using an API identifier. Since ther are various duplicates, I also wanted to join
non unique values.
Below is an example of the dataframe:
import pandas as pd
df = pd.DataFrame({'API': ['16101030580000', '16101030580000', '16129057600000','16013006300000'],
'Date': ['0000/00/00','6/15/2007', '5/25/2020', '7/31/2014'],
'Annual_Oil':[300,'nan',150, 360],
'State':['KY','None', 'None', 'KY']})
Additionally, I created a list of values that I did not want to join. However, when runnning the code, I get some values in the dataframe that should not be there.
list_none = ['none', 'nan', 'NAN','None', '0000/00/00','000']
df1 = pd.pivot_table(df, index = 'API',
aggfunc = lambda x: (','.join(x.unique().astype(str)) if x not in list_none else x),
sort = False)
The output for this example dataframe looks like
Date Annual_Oil State
API
16101030580000 0000/00/00,6/15/2007 300,nan KY,None
16129057600000 5/25/2020 150 None
16013006300000 7/31/2014 360 KY
Is there a way to restructure the lambda function within the pivot table or would I have to get rid of the unwanted joins manually?
You should filter the values within the join
:
list_none = ['none', 'nan', 'NAN', 'None', '0000/00/00', '000']
df1 = pd.pivot_table(
df,
index='API',
aggfunc=lambda x: ','.join(
i for i in x.unique().astype(str) if i not in list_none
),
sort=False,
)
Alternative using a custom function:
def cust_join(x):
x = x.dropna().astype(str)
return ','.join(x[~x.isin(list_none)].unique())
df1 = pd.pivot_table(
df,
index='API',
aggfunc=cust_join,
sort=False,
)
Output:
Date Annual_Oil State
API
16101030580000 6/15/2007 300 KY
16129057600000 5/25/2020 150
16013006300000 7/31/2014 360 KY