image-processing machine-learning feature-extraction entropy information-extraction

How to calculate the information gain of a continuous feature

I'm having problem in in finding the right parameters for the information gain, if I don't have any discrete values and thus I first need to discretize these points into intervals.

What I have:

I'm doing image processing, where my features have a possible range 0-255. With some training data I can define some intervals (which only define "is object or is not object"). If goods are the number of intervals for for a matching point and bads is labeled for its environment. I'll calculate it this way with

$ratio=\frac{goods}{allIntervalls}}$

information gain for this case:

$IG=Entropy\frac{goods}{allIntervalls}}-ratio(Entropy\frac{goods}{allIntervalls}}+Entropy\frac{bads}{allIntervalls}})$

where

$Entropy(p)=-log(p)p$

Results and idea:

For some reason I end up with a negative IG which is quiet nonsense but I don't see the error. Another idea was instead of counting the object-matching intervals forgood, count the samples in good that fit into any good-interval.

Has anyone an idea?

Solution

I don't see what you have there as before and after (or P and Q) distributions.

Have you changed anything to go from one situation to another? It's unclear.

Look at What is "entropy and information gain"?

It seems good+bad represent the whole distribution.

So you need to have something change to go from one (good, bad) to another (good, bad).

Then you apply the formula correctly - or follow the example

Your formula seems to be messed up.