Search code examples
pythonpython-3.xregressionscalingkaggle

SyntaxError while trying to perform RobustScaler on Pandas Dataframe


I am working with the House Prices Kaggle dataset. I am trying to use the RobustScaler from sklearn only on numerical features in the dataset (LotFrontage, LotArea, etc.). First, I fit the data to the numerical values of my dataframe by calling select_dtypes(exclude=['object']. Once the transformer has been fit to those values, I call the transform function, trying to transform those same values I just fit the data on by setting the transformer equal to object excluded attributes. Once I attempt that, I get the following error message:

SyntaxError: can't assign to function call

Data has already been rid of null values. What has worked is when I set the transform results equal to some variable, I get the results back as a numpy.ndarray

from sklearn.preprocessing import RobustScaler
transformer = RobustScaler().fit(df_train.select_dtypes(exclude=['object']))

df_train.select_dtypes(exclude=['object']) = transformer.transform(df_train.select_dtypes(exclude=['object'])) # This doesn't work

test = transformer.transform(df_train.select_dtypes(exclude=['object'])) # This DOES work, but not in the format I need

All I want is for the transformed attributes to go back into the original pandas data frame at their corresponding locations. Is there some workaround I can implement if I can't convert the original dataframe results directly?


Solution

  • I managed to get it to work. Not sure how Pythonic this solution is, but it got me back on track:

    df_train[list(df_train.select_dtypes(exclude=['object']).columns)] = RobustScaler().fit_transform(df_train[list(df_train.select_dtypes(exclude=['object']).columns)])