Search code examples
pythonpandasscikit-learn

TypeError: Feature names are only supported if all input features have string names, but your input has ['str', 'str_'] as column name types


When I try to fit the scikit-learn's StandardScaler into my pandas dataframe I get the following error:

TypeError: Feature names are only supported if all input features have string names, but your input has ['str', 'str_'] as feature name / column name types. If you want feature names to be stored and validated, you must convert them all to strings, by using X.columns = X.columns.astype(str) for example. Otherwise you can remove feature / column names from your input data, or convert them all to a non-string data type.

This error occurs in this part of my code:

scaler.fit(data[map_keys])

Here data is a dataframe and map_keys is a list containing only string values. Here is a sample from the data:

>> data[map_keys].head() outputs:

         loss  revenue  visit_number  ... 
1964      1.0      0.0           1.0  ...
1402      2.0      0.0           1.0  ...
2539      2.0      0.0           1.0  ...
86        2.0      0.0           1.0  ...
808       2.0      0.0           2.0  ...

What I did to fix this issue was to convert all elements in map_keys into str type with:

map_keys = [str(k) for k in map_keys]

as some of the elements in the list were of type np.str_ when I first encountered the issue. But the error still persists... Note that the scikit-learn version I use in this code is 1.2.1.


Solution

  • For me the astype(str) solution was not working so I got around with:

        X= X.rename(str,axis="columns")