I am having a small issue in comparing two dataframes and the dataframes are detailed as below. The dataframes detailed below are all in koalas.
import databricks.koalas as ks
mini_team_df_1 = ks.DataFrame(['0000340b'], columns = ['team_code'])
mini_receipt_df_2 = ks.DataFrame(['0000340b'], columns = ['team_code'])
mini_receipt_df_2['match_flag'] = mini_receipt_df_2['team_code'].isin(ks.DataFrame(mini_team_df_1))
mini_receipt_df_2
I am executing this code on databricks and I expect the mini_receipt_df_2
should have the output as follows:
team_code match_flag
0 0000340b True
But in my code shown above, the output is as follows:
team_code match_flag
0 0000340b False
This makes no sense to me as using the .isin function would give me the True
value for team_code = 0000340b
as this is the same in both dataframes.
May someone help me understand what is wrong?
Thank you
Try this:
mini_receipt_df_2['match_flag'] = np.isin(mini_team_df_1['team_code'].to_numpy(), mini_receipt_df_2['team_code'])
Output:
>>> mini_receipt_df_2
team_code match_flag
0 0000340b True