Search code examples
pythonmacosmachine-learningscikit-learnlinear-regression

Can't fix fit function in Linear Regression model


I try to use regression model until fit. by macos (M1) it work until fit() in last row.

import pandas as pd
import numpy as np

df=pd.read_csv('USA_Housing.csv')

column=df.columns

X=df[['Avg. Area Income', 'Avg. Area House Age', 'Avg. Area Number of Rooms',
       'Avg. Area Number of Bedrooms', 'Area Population', 'Address']]
y=df['Price']

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=101)

from sklearn.linear_model import LinearRegression

lm=LinearRegression()
lm.fit(X_train,y_train) # this throws an error

error show after run by PyCharm show result.

Traceback (most recent call last): File "/Users/krit/PycharmProjects/PythonRefresh/main.py", line 21, in lm.fit(X_train,y_train) 
File"/Users/krit/PycharmProjects/PythonRefresh/venv/lib/python3.9/site-packages/sklearn/base.py", line 1151, in wrapper return fit_method(estimator, *args, **kwargs) 
File"/Users/krit/PycharmProjects/PythonRefresh/venv/lib/python3.9/site-packages/sklearn/linear_model/_base.py", line 678, in fit X, y = self._validate_data( 
File"/Users/krit/PycharmProjects/PythonRefresh/venv/lib/python3.9/site-packages/sklearn/base.py", line 621, in _validate_data X, y = check_X_y(X, y, **check_params) 
File"/Users/krit/PycharmProjects/PythonRefresh/venv/lib/python3.9/site-packages/sklearn/utils/validation.py", line 1147, in check_X_y X = check_array( 
File"/Users/krit/PycharmProjects/PythonRefresh/venv/lib/python3.9/site-packages/sklearn/utils/validation.py", line 917, in check_array array = _asarray_with_order(array, order=order, dtype=dtype, xp=xp)
File"/Users/krit/PycharmProjects/PythonRefresh/venv/lib/python3.9/site-packages/sklearn/utils/_array_api.py", line 380, in _asarray_with_order array = numpy.asarray(array, order=order, dtype=dtype) 
File"/Users/krit/PycharmProjects/PythonRefresh/venv/lib/python3.9/site-packages/pandas/core/generic.py", line 2084, in array arr = np.asarray(values, dtype=dtype) ValueError: could not convert string to float: '1836 Shaw Lane Apt. 733\nGracetown, PW 83118-5264'

it work with window OS, but when I install PyCharm in macOS it does not work. how can I fix it?


Solution

  • You are trying to perform linear regression on string data. Refer this answer for a similar question as yours. As the error clearly states-

    ValueError: could not convert string to float: '1836 Shaw Lane Apt. 733\nGracetown, PW 83118-5264'
    

    The library you are using tries to convert this string to floating number which is not possible and hence the cause of your error.

    SOLUTION

    A very quick fix would be to remove all the columns like Address that may be containing string values.

    Also, I don't think the full address of the house is required for a good prediction. I would either remove that column or just use some bits like "Shaw Lane Apt" etc.

    Therefore, either remove that column or convert it into numbers. Free Advice- if you are thinking of using the address column categorize it by area and use one-hot encoding (though it would increase the complexity of your project).