Search code examples
joinleft-joinrelational-databaseinner-join

What would be correct join to use on the dataset?


I have two datasets: First Data Set

     Date      Revenue  Country  Platform   Application
    2018-05-18   200       US     Google      XYZ
    2018-05-18   100       US      IOS        XYZ
    2018-05-18   200       AU     Google      XYZ
    2018-05-18   100       US     Amazon      XYZ

Second Data Set

  Date      Platform  Application  Country  ConversionRate
2018-05-18   Google       XYZ        US         0.56
2018-05-18   Amazon       XYZ        US         0.75
2018-05-18   Samsung      XYZ        US         0.26
2018-05-18     IOS        XYZ        US         0.45

I am trying to work out the appropriate join for the data. The main data is from First Data set. I want the conversion rate from the second dataset with all the data from the first one and the value should be joined based on date as well as the application, platform, and country. I have tried inner join and left join but the output doen't seems right. Any help is appreciated.

The output that I am getting:

  Date      Revenue   Country   Platform  Application ConversionRate
2018-05-18    200        US      Google      XYZ           0.56
2018-05-18    200        US      Google      XYZ           0
2018-05-18    200        US      Google      XYZ           0
2018-05-18    200        US      Google      XYZ           0

the conversion rate should be same for a particular app(country, platform etc.) on a particular day. Even if the Google Platform is there multiple times, the conversion rate should repeat that number of times.


Solution

  • Figured out the issue. Just as we can add and condition in SQL, we can do that in Google DataPrep. That was the issue I was encountering. Simply added the multiple columns to check during the join operation and that seems to have solved the issue.