I am using a very small dataset to teach myself predictive data analytics. I am using both Weka and Orange to try and solve this issue.
To start with I am using this csv file to train the system:
gender,weight
M,82
F,71
M,90
F,76
M,88
F,56
M,100
F,63
M,84
F,79
M,92
F,66
You will notice that all the F values are below 80 and all the M values are above 80.
I then have this data file:
weight, gender
70,,
100,,
69,,
76,,
99,,
Notice that the 'gender' value is missing.
I would like to come up with a system that will read the data file and place either an M or F into the gender field based on some data analysis.
I looked into Linear Regression but that involves a relationship between two moving values (as X increases - so does Y)
I then looked into K-Clustering but all that did was show me two clusters with M > 80 and F < 80
Please can you advise a system I can use to try and apply some predictions to my dataset?
Much appreciated
This looks like something a decision tree can do easily. I looked up a weka tutorial for you since I've never used it. But the concepts are the same.