python pandas dataframe join pivot-table

Pd pivot table lambda function to join column values with exceptions not working

I am currently working with a dataframe looking at Kentucky oil wells with pandas and want to create a pivot table using an API identifier. Since ther are various duplicates, I also wanted to join non unique values.

Below is an example of the dataframe:

import pandas as pd

df = pd.DataFrame({'API': ['16101030580000', '16101030580000', '16129057600000','16013006300000'], 
                   'Date': ['0000/00/00','6/15/2007', '5/25/2020', '7/31/2014'],
                   'Annual_Oil':[300,'nan',150, 360],
                   'State':['KY','None', 'None', 'KY']})

Additionally, I created a list of values that I did not want to join. However, when runnning the code, I get some values in the dataframe that should not be there.

list_none = ['none', 'nan', 'NAN','None', '0000/00/00','000']
df1 = pd.pivot_table(df, index = 'API',
                        aggfunc = lambda x: (','.join(x.unique().astype(str)) if x not in list_none else x),
                        sort = False)

The output for this example dataframe looks like

                Date                    Annual_Oil State
API         
16101030580000  0000/00/00,6/15/2007    300,nan    KY,None
16129057600000  5/25/2020               150        None
16013006300000  7/31/2014               360        KY

Is there a way to restructure the lambda function within the pivot table or would I have to get rid of the unwanted joins manually?

Solution

You should filter the values within the join:

list_none = ['none', 'nan', 'NAN', 'None', '0000/00/00', '000']
df1 = pd.pivot_table(
    df,
    index='API',
    aggfunc=lambda x: ','.join(
        i for i in x.unique().astype(str) if i not in list_none
    ),
    sort=False,
)

Alternative using a custom function:

def cust_join(x):
    x = x.dropna().astype(str)
    return ','.join(x[~x.isin(list_none)].unique())

df1 = pd.pivot_table(
    df,
    index='API',
    aggfunc=cust_join,
    sort=False,
)

Output:

                     Date Annual_Oil State
API                                       
16101030580000  6/15/2007        300    KY
16129057600000  5/25/2020        150      
16013006300000  7/31/2014        360    KY