statistics machine-learning information-theory

Entropy and Information Gain

If I have a set of data like this:

Classification  attribute-1  attribute-2

Correct         dog          dog 
Correct         dog          dog
Wrong           dog          cat 
Correct         cat          cat
Wrong           cat          dog
Wrong           cat          dog

Then what is the information gain of attribute-2 relative to attribute-1?

I've computed the entropy of the whole data set: -(3/6)log2(3/6)-(3/6)log2(3/6)=1

Then I'm stuck! I think you need to calculate entropies of attribute-1 and attribute-2 too? Then use these three calculations in an information gain calculation?

Solution

Well first you have to calculate the entropy for each of the attributes. After that you calculate the information gain. Just give me a moment and I'll show how it should be done.

for attribute-1

attr-1=dog:
info([2c,1w])=entropy(2/3,1/3)

attr-1=cat
info([1c,2w])=entropy(1/3,2/3)

Value for attribute-1:

info([2c,1w],[1c,2w])=(3/6)*info([2c,1w])+(3/6)*info([1c,2w])

Gain for attribute-1:

gain("attr-1")=info[3c,3w]-info([2c,1w],[1c,2w])

And you have to do the same for the next attribute.