I am using the very basic linear classifier provided by scikit class Perceptron:
clf = linear_model.Perceptron(n_iter=12)
clf.fit(X,Y)
I have a X array where the rows are instances and the columns are binary features. I have an Y array with my classes. My data has three classes. I have two questions: 1)The perceptron algorithm requires a bias term. How is the scikit perceptron handling the bias? Should I add an "bias column" (all ones) to my input X data? Or does the scikit perceptron function adds automatically an bias to the X array (input) with features? Or is it handling the bias separatly? 2)How can find the training error for my perceptron?
1) the bias will be dealt with automatically. If you're unsure, try training with 2 versions of your data...your original data and a scaled version (standard scaler in sklearn).
2)
clf = linear_model.Perceptron(n_iter=12)
clf.fit(X, Y)
training_results = clf.predict(X)
training_error = 1 - metrics.accuracy_score(training_results, Y) #or pick your metric from metrics module.
As you can see, if you calculate the error when predicting with the data you trained with, this gives the training error. "Test error" is when you predict on data that your model has not "seen" yet. I subtracted from 1 because accuracy gives the percent of successful matches (a measure of success) whereas training error is a measure of error. There are many types of error...accuracy is just one.