I am trying to replace the column 'let'
in the DataFrame london
(which is a copy of another no_eco
) with rows that only contain the strings in the contains()
method. The code is as follows:
london = no_eco
london.loc[:,'let'] = london.loc[:,'let'].str.contains('E' or 'D' or 'F' or 'G' or 'H' or 'I' or 'J')
london.loc[:,'let'] = london.loc[:,'let'][london.loc[:,'let']]
london = london.dropna(subset = ['let'])
print(london)
The code works and I have dropped the rows where the strings are not met however I receive the following warning:
C:\Users\gerardchurch\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexing.py:543: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas- docs/stable/indexing.html#indexing-view-versus-copy
and when looking at the documentation, I still can't understand what I am doing wrong.
Is this okay to continue using variable london
or will I encounter problems in the future?
Thanks.
There are several issues with your code:
london = no_eco
doesn't assign a copy to london
. Be explicit: london = no_eco.copy()
.pd.Series.str.contains
supports regex by default, so use str.contains('E|D|F|G|H|I|J|')
.object
dtype series with a Boolean series, then you assign to it a subset indexed by itself, then use dropna
, which is designed for null values.Instead, just construct a Boolean series and use pd.DataFrame.loc
with Boolean indexing:
london = no_eco.copy()
london = london.loc[london['let'].str.contains('E|D|F|G|H|I|J|')]
For this particular case, you can use pd.DataFrame.__getitem__
(df[]
syntax) directly:
london = no_eco.copy()
london = london[london['let'].str.contains('E|D|F|G|H|I|J|')]