How do I evaluate the effectiveness of an algorithm that predicts probabilities?

I need to evaluate the effectiveness of algorithms which predict the probability of something occurring.

My current approach is to use "root mean squared error", ie. the square root of the mean of the errors squared, where the error is 1.0-prediction if the event occurred, or prediction if the event did not occur.

The algorithms have no specific applications, but a common one will be to come up with a prediction of an event occurring for each of a variety of options, and then selecting the option that maximizes this probability. The benefit to us is directly proportional to the rate at which the desired event occurs among the options that have the highest predicted probabilities.

It has been suggested that RMSE may not be the best option for this, and I'm interested in the opinions of others.

Solution

A chi-square test is a widely used distribution fitness test:

∑ (O_i - E_i)²/E_i

where O_i is the observed frequency of outcome i and E_i is the expected frequency. This chi-square test requires a minimal sample size (~ 5 or 10, depending on the distribution, particularly the degrees of freedom of the distribution) for each possible outcome. If the sample size requirement isn't met, you need to apply Yates' correction:

∑ (|O_i - E_i| - 0.5)²/E_i

Disclaimer: I'm not a statistician. The above probably misses some of the finer points. I know there's a good reason to use chi-square over RMSE, but I can't remember what it is.

Look for webpages that discuss hypothesis testing.