Search code examples
pythonpandasdataframein-place

How can I modify a dataframe element from a Series defined through df.loc[row]?


I have code where a function/method accepts a Series (row from df) and is supposed to modify it in-place, such that changes are reflected in the original df. However, I seem unable to force the modification as a view rather than a copy. Information from the documentation and a related question on Stack Overflow do not resolve the issue as given by the example below:

import pandas as pd
pd.__version__ # 0.24.2

ROW_NAME = "r1"
COL_NAME = "B"
NEW_VAL = 100.0

# df I would like to modify in-place
df = pd.DataFrame({"A":[[1], [2], [3,4]], "B": [1.0, 2.0, 3.0]}, index=["r1", "r2", "r3"])

# a row (Series reference) is the input param to a function that should modify df in-place
record = df.loc[ROW_NAME]
record.loc[COL_NAME] = NEW_VAL
assert df.loc[ROW_NAME, COL_NAME] == NEW_VAL #False

The line starting with record.loc results in the familiar warning: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame, which might make sense, except that record appears to reference df and can be modified in-place under some circumstances. An example of this:

record = df.loc[ROW_NAME]
record.loc["A"].append(NEW_VALUE)
assert NEW_VALUE in df.loc["r1", "A"] # True

My question is: how can I force a modification the float value at df.loc[ROW_NAME, COL_NAME] in-place from the Series record? Bonus points for clarifying why it is possible to modify column A in-place but not column B in the examples above.

Other related questions:


Solution

  • Based on the sources linked in the question and a thorough reading of the documentation, it does not appear possible to enforce returning a view vs copy of a Series generated from a DataFrame row.

    As @Lilith Schneider points out, the original confusion over this comes from the fact that record = df.loc["r1"] returns a shallow copy - some hybrid of a copy and view that may cause confusion and lead to unexpected behavior.