For one of my assignments in my AI class we were tasked with creating a perceptron learning implementation of the Widrow Hoff delta rule. I've coded this implementation in java:
The following github link contains the project: https://github.com/dmcquillan314/CS440-Homework/tree/master/CS440-HW2-1
The issue that I'm having is not with the creation of the perceptron. That is working fine.
In the project after training the perceptron I then applied an unclassified dataset to the perceptron to then learn the classifications of each input vector. This also worked fine.
My issue pertains to learning which feature of the inputs is the most important.
For example, if the feature set within each input vector was color, car model, and car make and we wanted to classify which feature was the most important. How would one go about doing so.
My original understanding of this led me to believe that calculating the correlation coefficient the value of that feature for each input and the classification vector that is produced. However, this turned out to be a false assumption.
Is there some other way that the most important feature can be learned?
EDIT
Sample weight vector:
( -752, 4771, 17714, 762, 6, 676, 3060, -2004, 5459, 9591.299, 3832, 14963, 20912 )
Sample input vectors:
(55, 1, 2, 130, 262, 0, 0, 155, 0, 0, 1, 0, 3, 0)
(59, 1, 3, 126, 218, 1, 0, 134, 0, 2.2, 2, 1, 6, 1)
(45, 1, 2, 128, 308, 0, 2, 170, 0, 0, 1, 0, 3, 0)
(59, 1, 4, 110, 239, 0, 2, 142, 1, 1.2, 2, 1, 7, 1)
The last element is the classification.
I will post an answer here when I find one. So far I believe that the answer given by the instructor is inaccurate.
This turned out to be a lot simpler than I originally thought. The answer/process is as follows:
Given a set of input vectors such as the following:
[1,0,1,0], [0,1,0,1]
The data is already constrained between 0 and 1 to minimize the variance. However, in the case of my data I have something more like the following:
[0,145,0,132],[0,176,0,140]
This causes the variance in some input features to be much larger and you would therefore not be able to use the weight vector as an indicator of feature importance. Therefore, in order for the weight vector to be an indicator of importance we much normalize the data first by dividing by the feature max.
For the above set that would be: [0,176,0,140]
This would result in a set of uniform feature vectors and would also result in the weight vector being an indicator of feature importance.