I have a dataset that is 30k in size. I have a column titled "Native Country" I want to create a new variable for every unique value in that column (the Algorithm I am using can only handle numeric value so I need to convert text to binary form).
When I use the following:
Native Country = pd.get_dummies(dataset.Native Country , prefix='Native Country' )
Native Country.head()
I get the following error message
SyntaxError: invalid syntax
Any suggestions please.
Python identifiers cannot have whitespaces. So you have to use underscore instead of whitespace in variable names. You also have to access column with […]
instead of .
if column name has a whitespace.
In [1]: import pandas as pd
In [2]: dataset = pd.DataFrame({'Native Country': ['a', 'b', 'a']})
In [6]: native_country = pd.get_dummies(dataset['Native Country'], prefix='Native Country'
...: )
In [7]: native_country.head()
Out[7]:
Native Country_a Native Country_b
0 1 0
1 0 1
2 1 0