I recently started studying ML during a youtube tutorial. Based on what was told in the tutorial, I decided to improve and apply to a kind of Guess Game.
The game has multiple scenarios and some numbers which player have to collect before he goes to the next stage. So I thought to apply this to ML and try to see what happens.
In my CSV file, I have 16 columns (stage and 1 to 15 numbers) and lots of rows. So, to predict what is the numbers of the last stage (1988), I directly put into a "...predict([[1988]]))" and got the
ValueError: Expected 2D array, got 1D array instead.
I know it's almost impossible to predict in this case, but my main goal here is to reduce the number of mistakes and see how good ML can be to solve this.
Could you guys show me what and where I'm doing wrong? To better explain, code is below:
import pandas
from sklearn.tree import DecisionTreeClassifier
game_data = pandas.read_csv('game_data2.csv')
game_list = game_data.drop(columns=['n1', 'n2', 'n3', 'n4', 'n5',
'n6', 'n7', 'n8', 'n9', 'n10',
'n11', 'n12', 'n13', 'n14', 'n15'])
game_stage = game_data['STAGE']
model = DecisionTreeClassifier()
model.fit(game_stage, game_list)
predictions = model.predict([[1988]])
predictions
Thank you in advance!
You need to reshape your game_stage
variable into a 2D array before passing it into model.fit()
.
If you modify your code like this, you don't get the error message:
import pandas
from sklearn.tree import DecisionTreeClassifier
import numpy as np
# Read data
game_data = pandas.read_csv('game_data2.csv')
game_list = game_data.drop(columns=['n1', 'n2', 'n3', 'n4', 'n5', 'n6', 'n7', 'n8', 'n9', 'n10', 'n11', 'n12', 'n13', 'n14', 'n15'])
game_stage = game_data['STAGE']
# Reshape into 2D array using numpy
game_stage = np.asarray(game_stage)
# -1 means this dimension is inferred from the data
game_stage = game_stage.reshape(-1,1)
# Train model
model = DecisionTreeClassifier()
model.fit(game_stage, game_list)
# Prediction
predictions = model.predict([[1988]])
predictions