I am pretty new to the concepts of machine learning and clustering. I have installed Weka and am trying to figure out how it works. Currently, I have my training data as below.
@relation weather
@attribute year real
@attribute temperature real
@attribute warmer {yes,no}
@data
1956 , 68.98585 , yes
1957 , 67.52131 , yes
1958 , 65.853386 , no
1959 , 66.32705 , yes
1960 , 65.89773 , no
So, I am trying to build a model which should predict if it is getting warmer each and every year.
If I have to predict if 1961 is warmer or cooler, should I provide my test data like below?
@relation weather
@attribute year real
@attribute temperature real
@data
1961 , 70.98585
I have removed the column warmer which I want to predict using the training set I provided earlier. I can use any algorithm that Weka provides me (J48, BayesNet etc). Can someone please help me out in figuring how to understand the concepts?
You don't need to make the training and test sets yourself, Weka will do that for you. Even if you do, don't delete the value to predict from the test set -- Weka will make sure that everything happens properly, but needs the actual value to determine whether a prediction is correct or not and tell you how your model performs.
Your problem is a classification problem, i.e. you want to predict the label "yes" or "no". Not all of the algorithms in Weka are applicable, but the ones that are not are greyed out (if you use the GUI).
On a more general note, you're unlikely to get good results with the data that you have. This is more of a time series prediction task (i.e. given these past values, how will it develop in the future), for which Weka doesn't really offer the algorithms. You can find some more information on Wikipedia.
To get better models with Weka, you could add the temperature value from the previous year (or the previous 2 years) as a feature, but ultimately it sounds like you want to use something that can do time series analysis and predictions.