Search code examples
pythonmongodbperformanceresteve

Collect records into single array in Eve/mongodb to reduce bandwidth


I have a record which is a dictionary of performance sampling at a specific revision of our source code. I am storing this in our eve database. We do this performance test for every revision. We have over 20,000 revisions.

I can get the values using http://host/api/performance?projection={"FileIO.Reads":1,"Revision":1}, which gives me 20,000 records with the following:

{
    "_items" : [
        { "_id" : ... ,
          "_updated": ...,
          "_created":...,
          "_etag":...,
          "Revision":1000,
          "FileIO" : {
            { "Reads": [20.34,10,30] } # avg/min/max
          }
        },
        # next item
        { "_id" : ... ,
          "_updated": ...,
          "_created":...,
          "_etag":...,
          "Revision":1001,
          "FileIO" : {
            { "Reads": [23,10,50] } # avg/min/max
          }
        }
        # and so on
]
}

Is there some way to ask Eve, or even better MongoDB, to group all of these into a single value of the form of [ [Revision, Reads], [Revision, Reads]... ] or even [Revision, Avg, Min, Max] to minimize the JSON conversion, performance and bandwidth cost?

Should I do my own processing in the event hooks? If so, in what way?

I think I should be able to do this with aggregation of some type but it isn't clear how to merge my revision with my FileIO Reads.

I don't really have any other ideas how to store this data - we just have a dictionary of performance values per revision.


Solution

  • I did some sleuthing and mucking about and came up with the following aggregation pipeline. I don't know if it is efficient but it does what I need it to do. I guess I kind-of understand how it works but the double grouping seems like it should be unnecessary.

    db.getCollection('test_profiles').aggregate( [
        { $group: { 
            _id : { revision :"$revision", value : "$FileIO.Reads" }
        }},
        { $unwind : "$_id"},
        { $group: { 
            _id : null,
            values:
            { $push: "$_id" }
        }}
    ])
    

    This yields the following kind of record:

    {
        "_id" : null,
        "values" : [ 
            {
                "revision" : 109999,
                "value" : [ 
                    0.903873742, 
                    0.00723229861, 
                    1.23190153
                ]
            }, 
            {
                "revision" : 109998,
                "value" : [ 
                    0.903873742, 
                    0.00723229861, 
                    1.23190153
                ]
            },
            // .. and on and on 
        ]
    }