Search code examples
deepstream.io

Using deepstream List for tens of thousands unique values


I wonder if it's a good/bad idea to use deepstream record.getList for storing a lot of unique values, for example, emails or any other unique identifiers. The main purpose is to be able to answer a question quickly whether we already have, say, a user with such email (email in use) or another record by specific unique field.

I made few experiments today and got two problems: 1) when I tried to populate the list with few thousands values I got

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory

and my deepstream server went off. I was able to fix it by adding more memory to the server node process with this flag

--max-old-space-size=5120

it doesn't look fine but allowed me to make a list with more than 5000 items.

2) It wasn't enough for my tests so I precreated the list with 50000 items and put the data directly to rethinkdb table and got another issue on getting the list or modifing it:

RangeError: Maximum call stack size exceeded

I was able to fix it with another flag:

--stack-size=20000

It helps but I believe it's only matter of time when one of those errors appear in production when the list size reaches proper value. I don't know really whether it's nodejs, javascript, deepstream or rethinkdb issue. That's all in general made me think that I try to use deepstream List wrong way. Please, let me know. Thank you in advance!


Solution

  • Whilst you can use lists to store arrays of strings, they are actually intended as collections of recordnames - the actual data would be stored in the record itself, the list would only manage the order of the records.

    Having said that, there are two open Github issues to improve performance for very long lists by sending more efficient deltas and by introducing a pagination option

    Interesting results in regards to memory though, definitely something that needs to be handled more gracefully. In the meantime you could drastically improve performance by combining updates into one:

    var myList = ds.record.getList( 'super-long-list' );
    
    // Sends 10.000 messages
    for( var i = 0; i < 10000; i++ ) {
        myList.addEntry( 'something-' + i );
    }
    
    // Sends 1 message
    var entries = [];
    for( var i = 0; i < 10000; i++ ) {
        entries.push( 'something-' + i );
    }
    
    myList.setEntries( entries );