Search code examples
pythonpandasnumpytrain-test-split

Problem when splitting data: KeyError: "None of [Int64Index([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13], dtype='int64')] are in the [columns]"


I am attempting to execute a train test split on some data, wine.data but when initializing x and y:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler

from sklearn.model_selection import cross_val_score

wine =  pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data")

print(wine.shape)
wine.head()
X = wine[np.arange(1,14)]
y = wine[0]

The rest of the code below this segment will not run as I get the error message:

KeyError: "None of [Int64Index([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13], dtype='int64')] are in the [columns]"

I have attempted to resolve this by changing the range of the X value or changing the np.arange function but neither help the problem.

Any help or advice would be greatly appreciated, thank you!


Solution

  • You forgot to add header=None to the dataframe constructor. The csv you are downloading doesn't have a header line. So, if you don't specify header=None, the first line of data will be used as the header.

    Try with

    wine =  pd.read_csv(
        "https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data",
        header=None
    )