Search code examples
c++centropy

Shannon's Entropy Algorithm Returning negative values


note: project is in C, not C++

float entr = 0;
for(int i = 0; i < map->entry_count; ++i)
{
    float x = (float)map->entry[i].occ / map->entry_count;
    if(x > 0)
    {
        entr -= x * log(x) / log(2);
    }
}
return entr;

when my map has 3 occurences of value 15, the entropy returned is -4.214.. something

when my map has 3 occurences of value 15 and one occurence of value 25, the entropy returned is 0.000

something is clearly wrong here. i don't really understand the maths behind it, so i got the algorithm off the internet, but i'd really like to get it to work.

BTW, the map is written by me. i cannot provide the code (it's for production), but:

  • occ = the number of occurences for a single value
  • entry_count = the number of values in the map. it does not increase when occurences increase, it increases when a new value is added

so basically it works just like a C++ map, to my knowledge


Solution

  • The x in x * log(x) should be the probability of the item for x. That could be calculated as the number of times the item occurs divided by the total number of occurrences of items. But it is calculated as map->entry[i].occ / map->entry_count;, which appears to be the number of times the item occurs divided by the number of items. This makes it greater than it should be.

    For example, if the 3 events/items/things A, B, and C occur 10, 13, and 17 times, respectively, then the x values should be 10/40, 13/40, and 17/40, but the source code in the question appears as if it would use 10/3, 13/3, and 17/3.