I have a class containing a pandas.DataFrame
and a method that returns subset of it's columns. I want to get a view, not a copy.
class Data:
def __init__(self, path, selected_features):
df = pandas.read_excel(path)
self._features = df[selected_features].astype("float64") # df[selected_features] is only int64 and float64
def features(self):
return self._features.iloc[, 0:2] # for simplicity let's just return 2 columns
Using np.shares_memory()
I've established that it returns a copy.
data = Data(path, selected_features)
print(np.shares_memory(data._features, data.features()))
# False
print(np.shares_memory(data._features, data._features.iloc[:,0:2]))
# False
I've tried using .loc
and it yields the same result.
Why is it returning a copy and how can I make it return a view.
Note: I've seen the docs and thread 1, thread 2 and thread 3 - none helped me resolve the issue.
There are some errors in your code:
data._features
does not exist in your code, maybe data.labels
?labels.iloc[0:2]
does not return 2 columns but 2 rows.Why does .loc/.iloc return a copy
Or not?
>>> df = pd.DataFrame(np.random.random((10, 4)), columns=list('ABCD')).copy()
>>> df._is_view
False
>>> df._is_copy
None
>>> hex(id(df))
'0x7fd3c7a5ba60'
>>> df.iloc[:, 0:2]._is_view
True
>>> df.iloc[:, 0:2]._is_copy
<weakref at 0x7fd43731c540; to 'DataFrame' at 0x7fd3c7a5ba60>
>>> np.shares_memory(df, df.iloc[0:2])
True
>>> np.shares_memory(df, df.iloc[0:2].copy())
False
Note: copy()
is important to avoid SettingWithCopyWarning because Pandas keeps a reference to the source DataFrame.