Search code examples
pythonscikit-learnlinear-regressioniris-dataset

Count not conver string to float using iris dataset


So I am using iris dataset on my sample linear regression code. But when I tried to train/fit the model. I get an error

ValueError: could not convert string to float: 'setosa'

This bugs and I could not find the fix for this one. Below is the code that I am using.

iris_df = pd.read_csv(r'C:\Users\Admin\iris.csv')
iris_df.describe()

# Variables
X= iris_df.drop(labels= 'sepal length in cm', axis= 1)
y= iris_df['sepal length in cm']

# Splitting the Dataset 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size= 0.47, random_state= 42)

# Instantiating LinearRegression() Model
lr = LinearRegression()

# Training/Fitting the Model
lr.fit(X_train, y_train)

Solution

  • As it is written in the example you are using, you need to transform your data first:

    # Converting Objects to Numerical dtype
    iris_df.drop('species', axis= 1, inplace= True)
    target_df = pd.DataFrame(columns= ['species'], data= iris.target)
    iris_df = pd.concat([iris_df, target_df], axis= 1)
    
    # Variables
    X= iris_df.drop(labels= 'sepal length (cm)', axis= 1)
    y= iris_df['sepal length (cm)']
    
    # Splitting the Dataset 
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size= 0.33, random_state= 101)
    
    # Instantiating LinearRegression() Model
    lr = LinearRegression()
    
    # Training/Fitting the Model
    lr.fit(X_train, y_train)
    
    # Making Predictions
    lr.predict(X_test)
    pred = lr.predict(X_test)
    
    # Evaluating Model's Performance
    print('Mean Absolute Error:', mean_absolute_error(y_test, pred))
    print('Mean Squared Error:', mean_squared_error(y_test, pred))
    print('Mean Root Squared Error:', np.sqrt(mean_squared_error(y_test, pred)))