Search code examples
resolvejs

Performance issues with large datasets


Is there any way of filtering the events in a projection associated with a read model by the aggregateId?

In the tests carried out we always receive all registered events. Is it possible to apply filters in a previous stage?

We have 100,000 aggregateId and each id has associated 15,000 events. Unable to filter by aggregateId, our projections have to iterate over all events.


Solution

  • So you have 100.000 aggregates with 15.000 events each.

    You can use ReadModel or ViewModel:

    Read Model:

    Read model can be seen as a read database for your app. So if you want to store some data about each aggregate, you should insert/update row or entry in some table for each aggregate, see Hacker News example read model code.

    It is important to understand that resolve read models are built on demand - on the first query. If you have a lot of events, it may take some time.

    Another thing to consider - a newly created resolve app is configured to use in-memory database for read models, so on each app start you will have it rebuilt.

    If you have a lot of events, and don't want to wait to wait for read models to build each time you start the app, you have to configure a real database storage for your read models.

    Configiuring adapters is not well documented, we'll fix this. Here is what you need to write in the relevant config file for mongoDB:

    readModelAdapters: [
      {
        name: 'default',
        module: 'resolve-readmodel-mongo',
        options: {
          url: 'mongodb://127.0.0.1:27017/MyDatabaseName',
        }
      }
    ]
    

    Since you have a database engine, you can use it for an event store too:

    storageAdapter: {
      module: 'resolve-storage-mongo',
      options: {
        url: 'mongodb://127.0.0.1:27017/MyDatabaseName',
        collectionName: 'Events'
      }
    }
    

    ViewModel ViewModel is built on the fly during the query. It does not require a storage, but it reads all events for the given aggregateId.

    reSolve view models are using snapshots. So if you have 15.000 events for a give aggregate, then on the first request all those events will be applied to calculate a vies state for the first time. After this, this state will be saved, and all subsequent requests will read a snapshot and all later events. By default snapshot is done per 100 events. So on the second query reSolve would read a snapshot for this view model, and apply not more than 100 events to it.

    Again, keep in mind, that if you want snapshot storage to be persistent, you should configure a snapshot adapter:

    snapshotAdapter: {
      module: 'resolve-snapshot-lite',
      options: {
        pathToFile: 'path/to/file',
        bucketSize: 100
      }
    }
    

    ViewModel has one more benefit - if you use resolve-redux middleware on the client, it will be kept up-to-date there, reactively applying events that app is receiving via websockets.