Search code examples
python-3.xdataframeindexingmultiple-columnsdata-cleaning

I'm trying to clean my data but it returns the wrong column


I am trying to take one of my imported data sets df19 and clean information out of it to create a second variable noneu19 where, you guessed it, EU countries are removed from the column Destination

Here is what I ran

noneu19=df19
noneu19["Destination"] = noneu19[~noneu19["Destination"].apply(str).str.contains('UK')]
noneu19["Destination"] = noneu19[~noneu19["Destination"].apply(str).str.contains('SWEDEN')]
noneu19["Destination"] = noneu19[~noneu19["Destination"].apply(str).str.contains('SPAIN')]
...
set(noneu19["Destination"])

(The ... replaces the 25 other lines)

what it returns is the list of data indexed in a completely separate column 'Location' for some reason.

If I do set(df19['Destination']) it returns the list that I am trying to clean, so it is not a problem in the original data set. Is there a way that I can do it easier/cleaner/better or a way to troubleshoot why it is returning the wrong column?

Thanks


Solution

  • You can create a list with all the countries in Eu such as

    EU = ['SPAIN', 'ITALY'..., 'EU_COUNTRY']
    

    then use isin function like this:

    noneu19 = df19.loc[~df19["Destination"].isin(EU)].copy()
    

    The function isin will check if an element of that very column is contained in the list you pass as the argument.

    Approaching the problem this way, you will have a more readible and easy to mantain code.