Search code examples
pythonpandasscikit-learnsklearn-pandas

Error while trying to predict numbers: Expected 2D array, got 1D array instead


I recently started studying ML during a youtube tutorial. Based on what was told in the tutorial, I decided to improve and apply to a kind of Guess Game.

The game has multiple scenarios and some numbers which player have to collect before he goes to the next stage. So I thought to apply this to ML and try to see what happens.

In my CSV file, I have 16 columns (stage and 1 to 15 numbers) and lots of rows. So, to predict what is the numbers of the last stage (1988), I directly put into a "...predict([[1988]]))" and got the

ValueError: Expected 2D array, got 1D array instead.

I know it's almost impossible to predict in this case, but my main goal here is to reduce the number of mistakes and see how good ML can be to solve this.

Could you guys show me what and where I'm doing wrong? To better explain, code is below:

import pandas
from sklearn.tree import DecisionTreeClassifier


game_data = pandas.read_csv('game_data2.csv')
game_list = game_data.drop(columns=['n1', 'n2', 'n3', 'n4', 'n5', 
                                    'n6', 'n7', 'n8', 'n9', 'n10', 
                                     'n11', 'n12', 'n13', 'n14', 'n15'])

game_stage = game_data['STAGE']

model = DecisionTreeClassifier()
model.fit(game_stage, game_list)


predictions = model.predict([[1988]])
predictions

Thank you in advance!


Solution

  • You need to reshape your game_stage variable into a 2D array before passing it into model.fit().

    If you modify your code like this, you don't get the error message:

    import pandas
    from sklearn.tree import DecisionTreeClassifier
    import numpy as np
    
    # Read data
    game_data = pandas.read_csv('game_data2.csv')
    game_list = game_data.drop(columns=['n1', 'n2', 'n3', 'n4', 'n5', 'n6', 'n7', 'n8', 'n9', 'n10', 'n11', 'n12', 'n13', 'n14', 'n15'])
    game_stage = game_data['STAGE']
    
    # Reshape into 2D array using numpy
    game_stage = np.asarray(game_stage)
    # -1 means this dimension is inferred from the data
    game_stage = game_stage.reshape(-1,1)
    
    # Train model
    model = DecisionTreeClassifier()
    model.fit(game_stage, game_list)
    
    # Prediction
    predictions = model.predict([[1988]])
    predictions