Search code examples
python-3.xpandasdatecomparison

Pandas comparing values based on month and year


In my current problem statement, I would like to compare values in two different columns which are aligned to a specific month and return a value that would act as a identifier. My dataframe looks like the following:

Account    year   month   value_1  value_2   
A          2021   Jan              9
A          2021   Feb              
A          2021   Mar       7       
A          2021   Apr       8      
A          2021   May              
B          2021   Jan       2       
B          2021   Feb              10
B          2021   Mar       5      
B          2021   Apr       7        

Now in the above scenario, for account A, the value 9 appeared in a earlier month in value_2 column as compared to value_1 and in account B, the value 2 appeared before the value 10 in value_2 column. Essentially I want to compare if value_2 column values appeared before value_1 column for every account and these columns are sorted by month and year already.

What I want to do is to create a resultant dataframe that shows the following:

account      result
A            value_2 appeared before value_1
B            value_1 appeared before value_2

How can I achieve this?


Solution

  • Assuming you will only have at least 1 value per group and never 2 values in the same row... this should work for you.

    def check_order(group):
        value_1_idx = group['value_1'].dropna().index.min() 
        value_2_idx = group['value_2'].dropna().index.min()
        if (value_1_idx.min() < value_2_idx) or ((value_2_idx is np.nan) & (value_1_idx != np.nan)):
            return 'value_1 appeared before value_2'
        elif (value_2_idx < value_1_idx) or ((value_1_idx is np.nan) & (value_2_idx != np.nan)):
            return 'value_2 appeared before value_1'
    
    result = df.groupby('Account').apply(check_order).reset_index(name='result')