Search code examples
pythonpandasdatabricksazure-databricksspark-koalas

Check if two dataframes have the same values in the column using .isin in koalas dataframe


I am having a small issue in comparing two dataframes and the dataframes are detailed as below. The dataframes detailed below are all in koalas.

import databricks.koalas as ks


mini_team_df_1 = ks.DataFrame(['0000340b'], columns = ['team_code'])

mini_receipt_df_2 = ks.DataFrame(['0000340b'], columns = ['team_code'])

mini_receipt_df_2['match_flag'] = mini_receipt_df_2['team_code'].isin(ks.DataFrame(mini_team_df_1))

mini_receipt_df_2

I am executing this code on databricks and I expect the mini_receipt_df_2 should have the output as follows:

    team_code   match_flag

0   0000340b     True

But in my code shown above, the output is as follows:

    team_code   match_flag
0   0000340b     False

This makes no sense to me as using the .isin function would give me the True value for team_code = 0000340b as this is the same in both dataframes.

May someone help me understand what is wrong?

Thank you


Solution

  • Try this:

    mini_receipt_df_2['match_flag'] = np.isin(mini_team_df_1['team_code'].to_numpy(), mini_receipt_df_2['team_code'])
    

    Output:

    >>> mini_receipt_df_2
      team_code  match_flag
    0  0000340b        True