Say I have the following multi-index dataframe:
arrays = [np.array(['bar', 'bar', 'bar', 'bar', 'foo', 'foo', 'foo', 'foo']),
pd.to_datetime(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04', '2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04'])]
df = pd.DataFrame(np.zeros((8, 4)), index=arrays)
0 1 2 3
bar 2020-01-01 0.0 0.0 0.0 0.0
2020-01-02 0.0 0.0 0.0 0.0
2020-01-03 0.0 0.0 0.0 0.0
2020-01-04 0.0 0.0 0.0 0.0
foo 2020-01-01 0.0 0.0 0.0 0.0
2020-01-02 0.0 0.0 0.0 0.0
2020-01-03 0.0 0.0 0.0 0.0
2020-01-04 0.0 0.0 0.0 0.0
How do I select only the part of this dataframe where the first index level = 'bar'
, and date > 2020.01.02
, such that I can add 1 to this part?
To be clearer, the expected output would be:
0 1 2 3
bar 2020-01-01 0.0 0.0 0.0 0.0
2020-01-02 0.0 0.0 0.0 0.0
2020-01-03 1.0 1.0 1.0 1.0
2020-01-04 1.0 1.0 1.0 1.0
foo 2020-01-01 0.0 0.0 0.0 0.0
2020-01-02 0.0 0.0 0.0 0.0
2020-01-03 0.0 0.0 0.0 0.0
2020-01-04 0.0 0.0 0.0 0.0
I managed slicing it according to the first index:
df.loc['bar']
But then I am not able to apply the condition on the date.
Here is possible compare each level and then set 1
, there is :
for all columns in DataFrame.loc
:
m1 = df.index.get_level_values(0) =='bar'
m2 = df.index.get_level_values(1) > '2020-01-02'
df.loc[m1 & m2, :] = 1
print (df)
0 1 2 3
bar 2020-01-01 0.0 0.0 0.0 0.0
2020-01-02 0.0 0.0 0.0 0.0
2020-01-03 1.0 1.0 1.0 1.0
2020-01-04 1.0 1.0 1.0 1.0
foo 2020-01-01 0.0 0.0 0.0 0.0
2020-01-02 0.0 0.0 0.0 0.0
2020-01-03 0.0 0.0 0.0 0.0
2020-01-04 0.0 0.0 0.0 0.0