SELECT a.year, a.country, b.amount
FROM table_a a
LEFT JOIN table_b b
ON a.country=b.country
AND (CASE WHEN b.country = 'Europe' THEN b.year = 2022 ELSE b.year = 2023 END)
I am trying to transform this code into python. What I have already tried is using pd.merge()
for the left join but unsure how to proceed with using case when in join condition of pandas merge.
Given the following input:
table_a = pd.DataFrame({
'country': ['Europe', 'Europe', 'USA', 'Africa'],
'year': [2022, 2020, 2023, 2021]
})
table_b = pd.DataFrame({
'country': ['Europe', 'USA', 'Africa', 'USA', 'Europe'],
'year': [2023, 2022, 2022, 2023, 2022],
'amount': [10, 20, 30, 40, 50]
})
the output should be:
country year amount
0 Europe 2022 50.0
1 Europe 2020 50.0
2 USA 2023 40.0
3 Africa 2021 NaN
How do I make it work?
Since the condition is used to filter table_b
before merging, we can go ahead and filter table_b
first using the condition and merge later on country
column.
output = (
table_a.merge(
table_b.query("(country == 'Europe' and year == 2022) or (country != 'Europe' and year == 2023)"),
on=['country'], how='left', suffixes=('', '_'))
[['year', 'country', 'amount']]
)
Using Nick's example, the above code produces the following output:
country year amount
0 Europe 2022 50.0
1 USA 2023 40.0
2 Africa 2021 NaN