Search code examples
couchbasesql++

Filtering multiple nested array with SQL++ query


In my bucket the documents have this structure

{
    "version" : 1,
    "root1" : "root1",
    "root2" : "root2",
    "i" : [
        {
            "i1" : "i1",
            "i2" : "i2",
            "p" : [
                {
                    "id" : 1,
                    "p2" : "p2",
                    "tr" : [
                        {
                            "id" : 1,
                            "x" : "VALID",
                            "check" : false
                        },
                        {
                            "id" : 2,
                            "x" : "INVALID",
                            "check" : false
                        }
                    ]

                },
                {
                    "id" : 2,
                    "p2" : "p2",
                    "tr" : [
                        {
                            "id" : 1,
                            "x" : "VALID",
                            "check" : true
                        },
                        {
                            "id" : 2,
                            "x" : "INVALID",
                            "check" : true
                        }
                    ]

                }
            ]
        }
    ]

}

i is an array that contains the field p which is also an array and it contains also another array field tr

I want to return all document that have tr.x = "VALID" and check = false and eliminate from it trthat doesn't much this criteria. So for the last example the return should be

{
    "version" : 1,
    "root1" : "root1",
    "root2" : "root2",
    "i" : [
        {
            "i1" : "i1",
            "i2" : "i2",
            "p" : [
                {
                    "id" : 1,
                    "p2" : "p2",
                    "tr" : [
                        {
                            "id" : 1,
                            "x" : "VALID",
                            "check" : false
                        }
                    ]

                }
            ]
        }
    ]
}


Solution

  • To select:

    SELECT *
    FROM default
    WHERE ANY a IN i
          SATISFIES
            ANY b IN a.p 
            SATISFIES
                ANY c IN b.tr 
                SATISFIES 
                    c.x = 'VALID' AND c.check = false
                END
            END
          END
    

    This simply nests the conditions for each nested array until the individual fields can be filtered.

    An approach to returning only the matched array data is to apply similar logic:

    SELECT OBJECT_PUT(default
                     ,"i"
                     ,ARRAY OBJECT_PUT(ii
                                      ,"p"
                                      ,ARRAY OBJECT_PUT(ppp
                                                       ,"tr"
                                                       ,ARRAY tttt 
                                                        FOR tttt IN ppp.tr 
                                                        WHEN
                                                            tttt.x = 'VALID' AND tttt.check = false 
                                                        END
                                                       )
                                       FOR ppp IN ii.p 
                                       WHEN
                                         ANY ttt IN ppp.tr
                                         SATISFIES
                                            ttt.x = 'VALID' AND ttt.check = false
                                         END 
                                       END
                                      ) 
                      FOR ii IN default.i 
                      WHEN 
                        ANY pp IN ii.p 
                        SATISFIES 
                            ANY tt IN pp.tr
                            SATISFIES 
                                tt.x = 'VALID' AND tt.check = false 
                            END
                        END
                      END
                     )
    FROM default
    WHERE ANY a IN i
          SATISFIES
            ANY b IN a.p 
            SATISFIES
                ANY c IN b.tr 
                SATISFIES 
                    c.x = 'VALID' AND c.check = false
                END
            END
          END
    

    which just replaces (the OBJECT_PUT functions) elements with filtered elements at each nesting level. The same element selection is repeated at each level since the filtering occurs on the elements before the OBJECT_PUT is applied.

    Undoubtedly there are other approaches too.

    HTH.