I am attempting to execute a train test split on some data, wine.data but when initializing x and y:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import cross_val_score
wine = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data")
print(wine.shape)
wine.head()
X = wine[np.arange(1,14)]
y = wine[0]
The rest of the code below this segment will not run as I get the error message:
KeyError: "None of [Int64Index([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13], dtype='int64')] are in the [columns]"
I have attempted to resolve this by changing the range of the X value or changing the np.arange function but neither help the problem.
Any help or advice would be greatly appreciated, thank you!
You forgot to add header=None
to the dataframe constructor. The csv you are downloading doesn't have a header line. So, if you don't specify header=None
, the first line of data will be used as the header.
Try with
wine = pd.read_csv(
"https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data",
header=None
)