If I have a confusion matrix that is based upon a sample set, how do I determine the statistical power (confidence margin/interval) of my recall/precision/etc metrics? I know how to do statistical power analysis for the probability of conversion itself but how do I do it for the recall/precision?
Found the answer to this. It is a slightly modified equation of standard confidence interval calculation of p +/- Z_score_at_alpha * std_error. The only difference is that p (basically your recall probability) is computed with an offset -> adjusted_recall=(TP+2)/(TP+FN+4).
The general idea is that the standard confidence interval equation doesnt work when p is at 0 or 1. This equation provides an adjustment that allows it to work. its just a fudge factor.
Also, the std error is now sqrt(adjusted_recall(1-adjusted_recall)/(N+4)). This is known as the wilson score interval - https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval#Wilson_score_interval
http://www.stat.ucdavis.edu/~kwwong/STA13-SS1-12/Statistics_13_files/lecture05.pdf
https://stats.stackexchange.com/questions/109429/wilsons-adjustment-for-sample-proportion