I have a dataframe like this:
import pandas as pd
df = pd.DataFrame({
"time": [1, 2, 1, 2],
"site": ['a', 'a', 'b', 'b'],
"val1": [11, 12, 21, 22],
"val2": [101, 102, 201, 202]
})
df.set_index(['time', 'site'], inplace=True, append=False)
df = df.unstack("site")
print df
val1 val2
site a b a b
time
1 11 21 101 201
2 12 22 102 202
I would like change some values that match a boolean filter. e.g.:
ix = df.val1 > 20
print ix
site a b
time
1 False True
2 False True
A natural thing to try would be df.val1[ix] = 50
. This does the expected assignment, but gives a warning: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead
.
So now I am trying to achieve something similar using df.loc
. But I can't find any way to use df.loc
with this kind of boolean mask. This seems to be because I am using hierarchical columns, i.e., I don't have much trouble if I only have one set of values (val1). Unfortunately, assignments with boolean filters on hierarchical columns aren't covered very well in the docs.
I've tried referring to df.loc[:,'val1',ix]
, but that gives IndexingError: Too many indexers
. I've tried df.loc[:,'val1'][ix] = 50
, and that works but gives the SettingWithCopyWarning
.
I can use df.val1 = df.val1.where(~ix, other=50)
but this seems unintuitive, inefficient and inflexible (e.g., it couldn't easily be extended to add 10 to the existing values).
Is there some other indexing method I should use to assign values to a hierarchical column of a dataframe, based on a boolean mask?
Edited to extend the question:
I didn't realize this would be an issue, but I'd actually like to filter based on values in both the val1
and val2
columns and change values in both sets of columns, something like this:
ix = (df.val1 > 20) | (df.val2 < 102)
df.val1[ix] = 50
df.val2[ix] = 150
Is there a simple indexing approach that can do this? It's pretty easy with numpy ndarrays, but seems to be much trickier with a pandas dataframe.
you can just use a list to select you column
idx = df[['val1']] > 20
idx
Out[39]:
val1
site a b
time
1 False True
2 False True
df[idx] = 50
df
Out[41]:
val1 val2
site a b a b
time
1 11 50 101 201
2 12 50 102 202