I am trying to take one of my imported data sets df19
and clean information out of it to create a second variable noneu19
where, you guessed it, EU countries are removed from the column Destination
Here is what I ran
noneu19=df19
noneu19["Destination"] = noneu19[~noneu19["Destination"].apply(str).str.contains('UK')]
noneu19["Destination"] = noneu19[~noneu19["Destination"].apply(str).str.contains('SWEDEN')]
noneu19["Destination"] = noneu19[~noneu19["Destination"].apply(str).str.contains('SPAIN')]
...
set(noneu19["Destination"])
(The ... replaces the 25 other lines)
what it returns is the list of data indexed in a completely separate column 'Location'
for some reason.
If I do set(df19['Destination'])
it returns the list that I am trying to clean, so it is not a problem in the original data set. Is there a way that I can do it easier/cleaner/better or a way to troubleshoot why it is returning the wrong column?
Thanks
You can create a list
with all the countries in Eu such as
EU = ['SPAIN', 'ITALY'..., 'EU_COUNTRY']
then use isin
function like this:
noneu19 = df19.loc[~df19["Destination"].isin(EU)].copy()
The function isin
will check if an element of that very column is contained in the list
you pass as the argument.
Approaching the problem this way, you will have a more readible and easy to mantain code.