Search code examples
pythonpandasdataframeviewcopy

Why does .loc/.iloc return a copy and how can I make it return a view Pandas DataFrame?


I have a class containing a pandas.DataFrame and a method that returns subset of it's columns. I want to get a view, not a copy.

class Data:
    def __init__(self, path, selected_features):
        df = pandas.read_excel(path)
        self._features = df[selected_features].astype("float64") # df[selected_features] is only int64  and float64

    def features(self):
        return self._features.iloc[, 0:2] # for simplicity let's just return 2 columns

Using np.shares_memory()I've established that it returns a copy.

data = Data(path, selected_features)
print(np.shares_memory(data._features, data.features()))
# False
print(np.shares_memory(data._features, data._features.iloc[:,0:2]))
# False

I've tried using .loc and it yields the same result.

Why is it returning a copy and how can I make it return a view.

Note: I've seen the docs and thread 1, thread 2 and thread 3 - none helped me resolve the issue.


Solution

  • There are some errors in your code:

    • data._features does not exist in your code, maybe data.labels?
    • labels.iloc[0:2] does not return 2 columns but 2 rows.

    Why does .loc/.iloc return a copy

    Or not?

    >>> df = pd.DataFrame(np.random.random((10, 4)), columns=list('ABCD')).copy()
    
    >>> df._is_view
    False
    
    >>> df._is_copy
    None
    
    >>> hex(id(df))
    '0x7fd3c7a5ba60'
    
    >>> df.iloc[:, 0:2]._is_view
    True
    
    >>> df.iloc[:, 0:2]._is_copy
    <weakref at 0x7fd43731c540; to 'DataFrame' at 0x7fd3c7a5ba60>
    
    >>> np.shares_memory(df, df.iloc[0:2])
    True
    
    >>> np.shares_memory(df, df.iloc[0:2].copy())
    False
    

    Note: copy() is important to avoid SettingWithCopyWarning because Pandas keeps a reference to the source DataFrame.