Search code examples
c#mongodbmongodb-.net-driver

MongoDB how to get K documents start from the middle of the collection in a single query?


I have N records which match a query q in a collection (eg. messages) of MongoDB. And I want to get the documents in range [N/2, N/2 + 100).

Without knowing the value of N, I can do this by issue 2 query:

  1. use N = db.messages.find(q).count() to get N, then compute the offset with skipCount = N / 2 - 1;
  2. use db.messages.find(q).skip(skipCount).limit(100) to get the results

Is there a way (especially in .net MongoDB.Driver 2.7.2) to merge the 2 query into a single one to improve the performance?


Solution

  • You need $facet operator to run more than one aggregation pipeline simultaneously and then process the result using result returned by multiple pipelines.

    db.col.aggregate([
        {
            $match: q
        },
        {
            $facet: {
                count: [{ $count: "total" }],
                docs: [ { $match: q } ] // this is supposed to pass all input documents to the output but we can't specify empty array here thus we can repeat q
            }
        },
        {
            $unwind: "$count"
        },
        {
            $project: {
                docs: {
                    $slice: [ "$docs", { $multiply: [ -1, { $ceil: { $divide: [ "$count.total", 2 ] } } ] } ]
                }
            }
        },
        {
            $unwind: "$docs"
        },
        {
            $replaceRoot: {
                newRoot: "$docs"
            }
        }
    ])
    

    Each stage defined in $facet will return an array however we know that count should contain only one element so we can use $unwind on it. Second field (docs) will contain all the elements that were returned after q query. To take last k elements you can use $slice passing negative value as second parameter (takes last k elements). Then you need to transformed sliced array back to original shape so you need $unwind and $replaceRoot.

    Since the aggregation is a bit complicated probably the best option in C# is to use BsonDocument class, try:

    FilterDefinition<BsonDocument> q = // your query
    
    AggregateFacet<BsonDocument> countFacet = 
        AggregateFacet.Create<BsonDocument, BsonDocument>("count",
        PipelineDefinition<BsonDocument, BsonDocument>.Create(new IPipelineStageDefinition[] {
            new BsonDocumentPipelineStageDefinition<BsonDocument, BsonDocument>(BsonDocument.Parse("{ $count: \"total\" }"))
        }));
    
    AggregateFacet<BsonDocument> matchFacet =
        AggregateFacet.Create<BsonDocument, BsonDocument>("docs",
        PipelineDefinition<BsonDocument, BsonDocument>.Create(new IPipelineStageDefinition[] {
            PipelineStageDefinitionBuilder.Match(q)
        }));
    
    
    var projection = new BsonDocumentProjectionDefinition<BsonDocument>(
        BsonDocument.Parse("{ docs: { $slice: [ \"$docs\", { $multiply: [ -1, { $ceil: { $divide: [ \"$count.total\", 2 ] } } ] } ] } }"));
    
    var replaceRoot = new BsonValueAggregateExpressionDefinition<BsonDocument, BsonDocument>("$docs");
    
    var result = Col.Aggregate()
                    .Match(q)
                    .Facet<BsonDocument>(new[] { countFacet, matchFacet })
                    .Unwind("count")
                    .Project(projection)
                    .Unwind("docs")
                    .ReplaceRoot(replaceRoot)
                    .ToList<BsonDocument>();