Search code examples
mongodbmongodb-queryquery-optimization

how to optimize a mongodb `distinct` query with a date constraint?


given a collection called probe which grows by 11 million records per week and contains a record of vulnerability exploit probes that include a date property and a source.ip property, ie:

{
  _id: ObjectId('64943e2001000100160b843e'),
  date: '2023-06-22T12:27:12Z',
  source: {
    ip: '217.17.230.18',
    port: 56882
  },
  target: {
    ip: '88.198.64.58',
    port: 22
  }
}

what type of index would improve the performance of a query which selects distinct source.ip where date is greater than a variable date? ie (node.js):

export const recent = async (since) => (
  await client.db('cichlid').collection('probe').distinct(
    'source.ip',
    {
      date: {
        $gte: since.toISOString().replace('.000Z', 'Z')
      }
    }
  )
);

the query is run from an aws lambda behind api gateway which limits it to a 30 second timeout. there are currently no indexes (other than the automatically created ones) on the collection and it has started timing out as the collection size has grown.

  • do i need a compound index for both fields (date, source.ip)?
  • should i create a simple index on date?
  • should i create a simple index on source.ip?

compass gives me a choice of index types (asc, desc, 2dsphere, text). since the date filter is generally now() minus 1 hour, i assume desc is appropriate for that field and text might be appropriate for the ip address. but i'm only guessing... enter image description here


Solution

  • i went ahead and created 3 indexes:

    • compound: date (desc) and source.ip (text)
    • simple: date (desc)
    • simple: source.ip (text)

    compass shows that only the simple date (desc) index gets used. the distinct source.ip query went from taking 44 seconds to complete, to 500 milliseconds. i guess this answers the question. enter image description here

    in this particular case, the index needs to be on the filter constraint field and this increases performance by a factor of 88x.