Search code examples
mongodbmongodb-compass

MongoDB Compass shows bad minimum value of data distribution of a key


I'm on MongoDB Compass Version 1.5.1 for mac.

When I look at distribution of values, Compass returns plots like the following:

values distribution

As you can see, min and max value are available. But min values are wrong. I know the minimum values of those two keys are 1 and 1, not 9 and 13.

Does Anyone know how to fix that problem?


Solution

  • Got it. The standard report is based on a sample of max 1000 documents.

    From the doc:

    Sampling in MongoDB Compass is the practice of selecting a subset of data from the desired collection and analyzing the documents within the sample set.

    Sampling is commonly used in statistical analysis because analyzing a subset of data gives similar results to analyzing all of the data. In addition, sampling allows results to be generated quickly rather than performing a potentially long and computationally expensive collection scan.

    MongoDB Compass employs two distinct sampling mechanisms.

    Collections in MongoDB 3.2 are sampled via the $sample operator in the aggregation framework of the core server. This provides efficient random sampling without replacement over the entire collection, or over the subset of documents specified by a query.

    Collections in MongoDB 3.0 and 2.6 are sampled via a backwards compatible algorithm executed entirely within Compass. It comprises three phases:

    1. Query for a stream of _id values, limit 10000 descending by _id
    2. Read the stream of _ids and save sampleSize randomly chosen values. We employ reservoir sampling to perform this efficiently.
    3. Then query the selected random documents by _id The choice of sampling > method is transparent in usage to the end-user.

    sampleSize is currently set to 1000 documents.