Search code examples
machine-learningsvmbayesianlogistic-regression

Machine Learning Algorithm Confusion


I made a small application about cricket prediction using Machine Learning. I took records of 10 years (2001-2011) of ODI matches and prepared a training set.

Now to predict a win or loss for a particular team, I considered various factors.

For example it is an India vs Australia match at Wankhede Stadium, India.

  1. India’s record in past 10 years.

  2. India’s record in past 2 years. (recent form)

  3. India’s record in India in past 10 years.

  4. India’s record in India in past 2 years. (recent form)

  5. India’s record at Wankhede, past 10 years.

  6. India’s record at Wankhede, past 2 years. (recent form)

  7. Australia’s record in past 10 years.

  8. Australia’s record in past two years.

  9. Australia’s record against India in past 10 years.

  10. Australia’s record against India in past 2 years.

  11. Australia’s record against India in past 10 years in India.

  12. Australia’s record against India in past 2 years in India.

So we took probabilities of all, Example, India played 322 matches in10 years and won 140, so the winning probability is 140/322 and so on for all the other factors. Now we added all the probabilities in the end and got a win loss percentage for both the countries. I wanted to know what kind of theorem is it. It started off as Naïve Bayes, but in Naïve Bayes we multiply the probabilities, unlike here. You can check the implementation here, http://www.manzarict.org/cricket We used basic PHP so that we could find probabilities faster using SQL queries. Now this might be a wrong approach to go about this sum, alternative methods are welcome.


Solution

  • This is a trivial linear model, where you don't even fit the weights of the model, but instead use the constant values. Linear models make deficision using

    cl(x) = sgn(<w,x>+b) = sgn( SUM w_i x_i + b )
    

    where x is your data point (x_i is ith feature). In your case, all w_i=1 (you just add all the features, that's all). Callin this "theorem" would be too much, it is just a priori assumed (as you do not train it) trivial (as it consists of constant values, no expert knowledge) linear model (as it uses weighted sum of features).