Search code examples
mongodbmapreduceprojection

Projecting a sub query in mongo


I am using a dynamic form builder to store values into mongo. Meaning the fields are defined at runtime.

Now I am trying to let the user build a dynamic grid view of data so that they can choose which fields they want to see at a glance. Thus I need to project a subset of fields stored in a collection item.

Here is an example of 2 records in the collection

{
"_id": {
    "$oid": "511ff0a8521e66d41b0d35d6"
},
"FormID": {
    "$uuid": "413ba627-94bf-0ca7-49b3-9ca2a1a3e9b5"
},
"ResultID": {
    "$uuid": "45f455ae-8486-aaa9-b97a-e480bfdf3db4"
},
"FieldValues": [
    {
        "FieldID": "first name",
        "FieldValue": "John"
    },
    {
        "FieldID": "last name",
        "FieldValue": "smith"
    },
    {
        "FieldID": "school",
        "FieldValue": "high school"
    },
    {
        "FieldID": "favorite subject",
        "FieldValue": "math"
    },

]
},

{
"_id": {
    "$oid": "511ff0a8521e66d41b0d35d7"
},
"FormID": {
    "$uuid": "413ba627-94bf-0ca7-49b3-9ca2a1a3e9b5"
},
"ResultID": {
    "$uuid": "45f455ae-8486-aaa9-b97a-e480bfdf3db5"
},
"FieldValues": [
    {
        "FieldID": "first name",
        "FieldValue": "sarah"
    },
    {
        "FieldID": "last name",
        "FieldValue": "smith"
    },
    {
        "FieldID": "school",
        "FieldValue": "high school"
    },
    {
        "FieldID": "favorite subject",
        "FieldValue": "english"
    },

]
},

Let's say I wanted to project Result Id, First Name, Last Name

In SQL -- I would have had FieldValues defined in its own table, and I would have done a sub query on Field values where ResultId = (parent's result id) and FieldID = "first name", then another sub query on "last name" and so on to flatten out the results.

I've been trying to figure out how to do this with mongo. I was able to find the $slice operator, but that only lets you get a consecutive set of array elements.

The reason why I don't want to get the whole document is that in some cases my customers have defined 400+ fields to track. Deserializing all of that on even 200 rows can mean 100MB of data passed over the network and deserialized (slow).

Any advice / suggestions would be appreciated


Solution

  • You can construct an appropriate aggregation framework syntax to return exactly what you want. It may not be fast enough but it will return the exact format you want without pulling the full document down. To make it fast I assume you would avoid running it over the entire collection by having the first stage of the pipeline be some {$match} that selects only the relevant subset of documents (and that criteria should be indexed).

    Using the following pipeline stages over your two sample documents for fields first name and last name you get back only _id and those fields. For a set of given field IDs you can generate this pipeline programmatically.

    unwind = { "$unwind" : "$FieldValues" };
    
    match = { "$match" : {
            "FieldValues.FieldID" : {
                "$in" : [
                    "first name",
                    "last name"
                ]
            }
        }
    };
    
    proj = { "$project" : {
            "first name" : {
                "$cond" : [
                    {
                        "$eq" : [
                            "first name",
                            "$FieldValues.FieldID"
                        ]
                    },
                    "$FieldValues.FieldValue",
                    "  skip"
                ]
            },
            "last name" : {
                "$cond" : [
                    {
                        "$eq" : [
                            "last name",
                            "$FieldValues.FieldID"
                        ]
                    },
                    "$FieldValues.FieldValue",
                    "  skip"
                ]
            }
        }
    };
    
    group = { "$group" : {
            "_id" : "$_id",
            "first name" : {
                "$max" : "$first name"
            },
            "last name" : {
                "$max" : "$last name"
            }
        }
    };
    
    db.project.aggregate(unwind, match, proj, group)
    {
        "result" : [
            {
                "_id" : ObjectId("511ff0a8521e66d41b0d35d7"),
                "first name" : "sarah",
                "last name" : "smith"
            },
            {
                "_id" : ObjectId("511ff0a8521e66d41b0d35d6"),
                "first name" : "John",
                "last name" : "smith"
            }
        ],
        "ok" : 1
    }