Search code examples
pythondata-analysiswekapredictionorange

Best data analysis method to predict where a certain value will fit in a dataset


I am using a very small dataset to teach myself predictive data analytics. I am using both Weka and Orange to try and solve this issue.

To start with I am using this csv file to train the system:

gender,weight
M,82
F,71
M,90
F,76
M,88
F,56
M,100
F,63
M,84
F,79
M,92
F,66

You will notice that all the F values are below 80 and all the M values are above 80.

I then have this data file:

weight, gender
70,,
100,,
69,,
76,,
99,,

Notice that the 'gender' value is missing.

I would like to come up with a system that will read the data file and place either an M or F into the gender field based on some data analysis.

I looked into Linear Regression but that involves a relationship between two moving values (as X increases - so does Y)

I then looked into K-Clustering but all that did was show me two clusters with M > 80 and F < 80

Please can you advise a system I can use to try and apply some predictions to my dataset?

Much appreciated


Solution

  • This looks like something a decision tree can do easily. I looked up a weka tutorial for you since I've never used it. But the concepts are the same.