Search code examples
javascriptnode.jsarrayssortingaggregate

Javascript large array of objects cannot be sorted for aggregation


Trying to process a snapshot from a database, we have over 2.5 million objects in the array, each object has an index property that is based on increments starting at an arbitrary number, in this case 45000.

Using a sort method myArray.sort((a,b)=> a.index - b.index) leaves the array fragmented.

example:

  • dataset index starts at 45000
  • myArray[0] correctly logs 45000
  • myArray[myArray.length-1] correctly logs 2545000
  • myArray[1] is incorrectly 45007

I thought the data was missing from the snapshot, but confirmed through: myArray.findIndex(e => e.index == 45001) reports index 12532 and the value is present in the serialized json file that is 1.4GB in size.

I have serialized with a read/write stream with each line containing the JSON.stringified object

Should I move to a Collection Map instead of an array? would .get() be efficient? I am currently using a For loop and iterating by index then finding the actual index to ensure they are incremental but its by far the slowest method.

for (let i = 0; i < maxIndex- minIndex; i++) {
    let obj = myArray.find(e => e.index == i + minIndex)
    if(!obj){ 
       console.log("missing index", i + minIndex);
       continue;
    }
    // process object
}

Solution

  • There appears to be some technical limitations with the .sort() functions on larger arrays. if you know the data set is large in volume, its easier to cast to a new array

        let newArray = new Array(maxIndex - minIndex + 1).fill(null);
        myArray.forEach(e => newArray [e.index - maxIndex ] = e);
        for (let i = 0; i < newArray .length; i++) {}