I just started learning machine learning, when practicing one of the task, I am getting value error, but I followed the same steps as the instructor does.
I am getting value error, please help.
dff
Country Name
0 AUS Sri
1 USA Vignesh
2 IND Pechi
3 USA Raj
First I performed labelencoding,
X=dff.values
label_encoder=LabelEncoder()
X[:,0]=label_encoder.fit_transform(X[:,0])
out:
X
array([[0, 'Sri'],
[2, 'Vignesh'],
[1, 'Pechi'],
[2, 'Raj']], dtype=object)
then performed One hot encoding for the same X
onehotencoder=OneHotEncoder( categorical_features=[0])
X=onehotencoder.fit_transform(X).toarray()
I am getting the below error:
ValueError Traceback (most recent call last)
<ipython-input-472-be8c3472db63> in <module>()
----> 1 X=onehotencoder.fit_transform(X).toarray()
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\preprocessing\data.py in fit_transform(self, X, y)
1900 """
1901 return _transform_selected(X, self._fit_transform,
-> 1902 self.categorical_features, copy=True)
1903
1904 def _transform(self, X):
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\preprocessing\data.py in _transform_selected(X, transform, selected, copy)
1695 X : array or sparse matrix, shape=(n_samples, n_features_new)
1696 """
-> 1697 X = check_array(X, accept_sparse='csc', copy=copy, dtype=FLOAT_DTYPES)
1698
1699 if isinstance(selected, six.string_types) and selected == "all":
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
380 force_all_finite)
381 else:
--> 382 array = np.array(array, dtype=dtype, order=order, copy=copy)
383
384 if ensure_2d:
ValueError: could not convert string to float: 'Raj'
Please edit my question is anything wrong, thanks in advance!
You can go directly to OneHotEncoding now without using the LabelEncoder, and as we move toward version 0.22 many might want to do things this way to avoid warnings and potential errors (see DOCS and EXAMPLES).
import pandas as pd
import numpy as np
from sklearn.preprocessing import OneHotEncoder
data= [["AUS", "Sri"],["USA","Vignesh"],["IND", "Pechi"],["USA","Raj"]]
df = pd.DataFrame(data, columns=['Country', 'Name'])
X = df.values
countries = np.unique(X[:,0])
names = np.unique(X[:,1])
ohe = OneHotEncoder(categories=[countries, names])
X = ohe.fit_transform(X).toarray()
print (X)
[[1. 0. 0. 0. 0. 1. 0.]
[0. 0. 1. 0. 0. 0. 1.]
[0. 1. 0. 1. 0. 0. 0.]
[0. 0. 1. 0. 1. 0. 0.]]
The first 3 columns encode the country names, the last four the personal names.
import pandas as pd
import numpy as np
from sklearn.preprocessing import OneHotEncoder
data= [["AUS", "Sri"],["USA","Vignesh"],["IND", "Pechi"],["USA","Raj"]]
df = pd.DataFrame(data, columns=['Country', 'Name'])
X = df.values
ohe = OneHotEncoder(categories='auto')
X = ohe.fit_transform(X).toarray()
print (X)
[[1. 0. 0. 0. 0. 1. 0.]
[0. 0. 1. 0. 0. 0. 1.]
[0. 1. 0. 1. 0. 0. 0.]
[0. 0. 1. 0. 1. 0. 0.]]
Now, here's the unique part. What if you only need to One Hot Encode a specific column for your data?
(Note: I've left the last column as strings for easier illustration. In reality it makes more sense to do this WHEN the last column was already numerical).
import pandas as pd
import numpy as np
from sklearn.preprocessing import OneHotEncoder
data= [["AUS", "Sri"],["USA","Vignesh"],["IND", "Pechi"],["USA","Raj"]]
df = pd.DataFrame(data, columns=['Country', 'Name'])
X = df.values
countries = np.unique(X[:,0])
names = np.unique(X[:,1])
ohe = OneHotEncoder(categories=[countries]) # specify ONLY unique country names
tmp = ohe.fit_transform(X[:,0].reshape(-1, 1)).toarray()
X = np.append(tmp, names.reshape(-1,1), axis=1)
print (X)
[[1.0 0.0 0.0 'Pechi']
[0.0 0.0 1.0 'Raj']
[0.0 1.0 0.0 'Sri']
[0.0 0.0 1.0 'Vignesh']]