note: project is in C, not C++
float entr = 0;
for(int i = 0; i < map->entry_count; ++i)
{
float x = (float)map->entry[i].occ / map->entry_count;
if(x > 0)
{
entr -= x * log(x) / log(2);
}
}
return entr;
when my map has 3 occurences of value 15, the entropy returned is -4.214.. something
when my map has 3 occurences of value 15 and one occurence of value 25, the entropy returned is 0.000
something is clearly wrong here. i don't really understand the maths behind it, so i got the algorithm off the internet, but i'd really like to get it to work.
BTW, the map is written by me. i cannot provide the code (it's for production), but:
so basically it works just like a C++ map, to my knowledge
The x
in x * log(x)
should be the probability of the item for x
. That could be calculated as the number of times the item occurs divided by the total number of occurrences of items. But it is calculated as map->entry[i].occ / map->entry_count;
, which appears to be the number of times the item occurs divided by the number of items. This makes it greater than it should be.
For example, if the 3 events/items/things A, B, and C occur 10, 13, and 17 times, respectively, then the x
values should be 10/40, 13/40, and 17/40, but the source code in the question appears as if it would use 10/3, 13/3, and 17/3.