#Start your code
#Hint - use pandas to read the Excel file data and then extract the data to a nump array "data"
df = pd.read_excel('A3data.xlsx')
data = df[['Exam1', 'Exam2','Admission Decision']].to_numpy()
#End your code
print('shape of sample data:', data.shape) # Check if data is 100 by 3
Load data into X_train a numpy array of shape (100,2) and y_train of shape (100,1)
X_train = data[0, [0,2]] <---- my attempt
y_train = data[0, [0,2]]
#It is a good idea to visualize data on a scatter plot, if possible. Here we can.
x_class0 = X_train[y_train == 0] <-- this is where the error is occuring
x_class1 = X_train[y_train == 1]
# Create a scatter plot
plt.scatter(x_class0[:, 0], x_class0[:, 1], color='blue', label='Not Admitted')
plt.scatter(x_class1[:, 0], x_class1[:, 1], color='red', label='Adm`itted')
error showing too many indices for array: array is 1-d but 2 was indexed
If you want to index it like a numpy array, I think you'll need to use numpy slicing:
all_data = pd.DataFrame(np.random.rand(10,3)).to_numpy()
x = all_data[:,:2]
y = all_data[:,:1]
x,y
As your current code will only return a single row, not the full columns.
However, that may cause problems down the road when you've lost your column headers. I'd suggest the following instead given your example:
data = df[['Exam1', 'Exam2','Admission Decision']]
x_train = data[['Exam1', 'Exam2']]
y_train = data['Admission Decision']
(But remember to do your train-test split before you separate into x and y)
However, the issue you're having here:
# Notice how these are the same dataframe!
X_train = data[0, [0,2]] <---- my attempt
y_train = data[0, [0,2]]
#It is a good idea to visualize data on a scatter plot, if possible. Here we can.
x_class0 = X_train[y_train == 0] <-- this is where the error is occurring
x_class1 = X_train[y_train == 1]
is because X_train
and y_train
are the same data frame as you've defined them -- and as such y_train
has two columns, not just one.