Search code examples
pythonscikit-learnjupyterstockkaggle

ValueError: could not convert string to float: '$257.26' - sklearn.tree.DecisionTreeClassifier - Python


I am trying to get my model to fit values from apple stock data that I imported from Kaggle.

Here's a link to the data:

https://www.kaggle.com/tarunpaparaju/apple-aapl-historical-stock-data

Here's my code:

import pandas as pd
from sklearn.tree import DecisionTreeClassifier

stock_data = pd.read_csv(r"C:\Users\renuc\Documents\Rohan\Jupyter\Apple Stock\apple_stock_history.csv", skipinitialspace = True)
X = stock_data.drop(columns=['Date', 'Close/Last', 'High', 'Low'])
y = stock_data.drop(columns=['Date', 'Volume', 'Open'])

model = DecisionTreeClassifier()
model.fit(X, y)

However, I get an error saying ValueError: could not convert string to float: '$257.26'. I believe that this is because of the dollar sign before the value, but I'm not sure how to delete the dollar sign from all of the values in the database.

(I am using Jupyter)


Solution

  • For one column:

    stock_data = stock_data[" Close/Last"].str.strip(" $")
    

    A more general solution (applies to every column except for Date and Volume):

    stock_data = stock_data.drop(["Date", " Volume"], axis=1).apply(lambda x: x.str.strip(" $"))
    

    Edit: If you want to keep all the columns:

    stock_data.drop(["Date", " Volume"], axis=1) = stock_data.drop(["Date", " Volume"], axis=1).apply(lambda x: x.str.strip(" $"))
    

    After this, you can drop any column you wish:

    stock_data.drop(["Date"], axis=1, inplace=True)