I have a data frame with many columns. some of them are string and some other are integer. I used this code to encode my data frame:
le = LabelEncoder()
for col in df.columns:
df_encoded[col] = df.apply(le.fit_transform)
it worked! but when I want to decode it with this code:
for col in df.columns:
df_decoded[col] = df_encoded.apply(le.inverse_transform)
I receive this error:
ValueError: ('The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()', 'occurred at index MYCOLUMNNAME')
The type of data differs from column to column, so using apply
with fit_transform
won't work here. It will seem to work properly but the LabelEncoder
will be fitted to the rightmost column at the end of the operation, so when you'll try to apply the inverse_transform
, the LabelEncoder will replace all the elements in the other columns with the ones it saw in the rightmost column. E.g.:
df = pd.DataFrame([{'A': 1, 'B': 'p'}, {'A': 1, 'B': 'q'}, {'A': 2, 'B': 'o'}, {'A': 3, 'B': 'p'}])
df
A B
0 1 p
1 1 q
2 2 o
3 3 p
df = df.apply(le.fit_transform)
df
A B
0 0 1
1 0 2
2 1 0
3 2 1 # Looks fine
df.apply(le.inverse_transform)
A B
0 o p
1 o q
2 p o
3 q p # Whoops
You will see the same result even if you iterate over the columns one by one and perform the fit_transform
and inverse_transform
.
You need to fit the encoder to the correct column before inversing:
le = LabelEncoder()
df_encoded = pd.DataFrame(columns=df.columns)
df_decoded = pd.DataFrame(columns=df.columns)
for col in df.columns:
df_encoded[col] = le.fit_transform(df[col])
df_encoded
A B
0 0 1
1 0 2
2 1 0
3 2 1
for col in df.columns:
le = le.fit(df[col])
df_decoded[col] = le.inverse_transform(df_encoded[col])
df_decoded
A B
0 1 p
1 1 q
2 2 o
3 3 p # Yeay