Search code examples
pythonpandassyntax-errordummy-variable

pandas get_dummies syntax error


I have a dataset that is 30k in size. I have a column titled "Native Country" I want to create a new variable for every unique value in that column (the Algorithm I am using can only handle numeric value so I need to convert text to binary form).

When I use the following:

Native Country = pd.get_dummies(dataset.Native Country , prefix='Native Country' )
Native Country.head()

I get the following error message

SyntaxError: invalid syntax

Any suggestions please.


Solution

  • Python identifiers cannot have whitespaces. So you have to use underscore instead of whitespace in variable names. You also have to access column with […] instead of . if column name has a whitespace.

    In [1]: import pandas as pd
    
    In [2]: dataset = pd.DataFrame({'Native Country': ['a', 'b', 'a']})
    
    In [6]: native_country = pd.get_dummies(dataset['Native Country'], prefix='Native Country'
       ...: )
    
    In [7]: native_country.head()
    Out[7]:
       Native Country_a  Native Country_b
    0                 1                 0
    1                 0                 1
    2                 1                 0