Here is an original dataframe,
country2
year 2017 2018 2019
country1 1 2 1 2 3 6
data_provider indicator
prov_1 ind_a 45 30 22 30 30 30
prov_2 ind_a 30 30 30 30 25 30
ind_b 30 32 30 30 30 30
prov_3 ind_b 30 30 30 35 30 28
and I wish to filter the column and finally get a new dataframe,
# item country2
# year 2017 2018 2019
# country1 1 2 1 2 3 6
# data_provider indicator
# prov_1 ind_a 45.0 NaN 22.0 NaN NaN NaN
# prov_2 ind_a NaN NaN NaN NaN NaN NaN
# ind_b NaN 32.0 NaN NaN NaN NaN
# prov_3 ind_b NaN NaN NaN NaN NaN NaN
you can get the original dataframe by,
df = pd.DataFrame(
data={"data_provider": ["prov_1", "prov_1", "prov_2", "prov_2", "prov_3", "prov_3"],
"indicator": ["ind_a", "ind_a", "ind_a", "ind_b", "ind_b", "ind_b"],
"unit": ["EUR", "EUR", "EUR", "EUR", "EUR", "EUR"],
"year": ["2017", "2018","2019", "2017","2018","2019"],
"country1": [1, 1, 3, 2, 2, 6],
"country2": [45, 22, 25, 32, 35, 28]
}
)
df = df.pivot_table(
index=['data_provider', 'indicator'],
columns=['year', 'country1'],
fill_value=30
)
df.columns.names = ['item', 'year', 'country1']
here is how I get the new dataframe,
locate the 2 group of target column labels
x1 = df.columns[df.columns.get_level_values(level='year')=='2017']
x2 = df.columns[df.columns.get_level_values(level='year')=='2018']
get newdf1 with condition1 df[x1]>30
newdf1 = df[df[x1] > 30]
get newdf2 with condition2 df[x2]<30
newdf2 = df[df[x2] < 30]
update newdf2 with newdf1
newdf = newdf2.combine_first(newdf1)
in my solution, I first get 2 dataframe after filtering the original dataframe with different conditions and then combine them together. I am wondering if there's a one direct way to achieve the goal.
import pandas as pd
df = pd.DataFrame(
data={
"data_provider": ["prov_1", "prov_1", "prov_2", "prov_2", "prov_3", "prov_3"],
"indicator": ["ind_a", "ind_a", "ind_a", "ind_b", "ind_b", "ind_b"],
"unit": ["EUR", "EUR", "EUR", "EUR", "EUR", "EUR"],
"year": ["2017", "2018","2019", "2017","2018","2019"],
"country1": [1, 1, 3, 2, 2, 6],
"country2": [45, 22, 25, 32, 35, 28]
}
)
df = df.pivot_table(
index=['data_provider', 'indicator'],
columns=['year', 'country1'],
fill_value=30
)
df.columns.names = ['item', 'year', 'country1']
cond1 = (df.columns.get_level_values(level='year') == '2017') & df.gt(30)
cond2 = (df.columns.get_level_values(level='year') == '2018') & df.lt(30)
df_new = df.where(cond1 | cond2)
print(df_new)
Output :
country2
year 2017 2018 2019
country1 1 2 1 2 3 6
data_provider indicator
prov_1 ind_a 45.0 NaN 22.0 NaN NaN NaN
prov_2 ind_a NaN NaN NaN NaN NaN NaN
ind_b NaN NaN NaN NaN NaN NaN
prov_3 ind_b NaN NaN NaN NaN NaN NaN