Search code examples
node.jsdictionarysethistogram

Array Set Length unequal to Sum of Array Distribution


I have a list of user ids like so: ["1111","5555","1111","8983",...]. I then compute the distribution of the frequency of the ids. But somehow adding the size of the distribution bins is smaller than the user set.

function histogram(List){
    var d = {};
    for(const x of List){
        if (x in d){
            d[x]+=1;
        }
        else{
            d[x]=1;
        }
    }
    return d
}

var featureuserids = f1_users.concat(f2_users,f3_users,f4_users)
var featureusers = [...new Set(featureuserids)];
const featurehist = histogram(Object.values(histogram(featureuserids)))
const n_featureusers = featureusers.length

Here is an example output.

Feature Users: 17379
Feature Hist: { '1': 16359, '2': 541, '3': 93, '4': 6 }

What is my mistake?


Solution

  • I have found the answer. One of my Lists (f1_users) had saved the ids as int, while the others were in string format. Therefore they were counted double in the set. After converting them all to string the issue was fixed.