Search code examples
sqlmongodbapacheapache-drill

Apache Drill query mongo array field type with IN() operator


Take the following document structure for sample where I need to do a SELECT ... WHERE field IN(values) in drill.

{   "CD_MATRICULA" : 12,
    "USUARIO" : {
        "ID_SITUACAO" : 1,
        "PUBLICOALVO" : [ 84,85,86,87,88,89 ]
    }
},
{   "CD_MATRICULA" : 14,
    "USUARIO" : {
        "ID_SITUACAO" : 1,
        "PUBLICOALVO" : [ 90,91,92,93,94 ]
    }
},
{   "CD_MATRICULA" : 122,
    "USUARIO" : {
        "ID_SITUACAO" : 0,
        "PUBLICOALVO" : [ 20,300,400,500,600 ]
    }
}

To find documents by USUARIO.PUBLICOALVO value, I can use the mongo query that follows:

db.getCollection('xxx').find({'USUARIO.PUBLICOALVO': {$in: [ 84, 85, 90, 94, 500 ]}})

it works fine, returning all the docs by IN() comparsion of lists.

But, when I try to execute the same mongo query in SQL of DRILL, I do this:

SELECT * FROM xxx WHERE xxx.USUARIO.PUBLICOALVO IN(84, 85, 90, 94, 500);

But this query fails, with the message:

Error in expression at index -1.  Error: Missing function implementation: [equal(INT-REPEATED, INT-REQUIRED)].  Full expression: --UNKNOWN EXPRESSION--.

How I can write this king of filter IN() in the drill sql syntax ?

thanks a lot


Solution

  • Apache Drill's Mongo storage plugin does not support the IN operator.

    The original documentation for Apache Drill's Mongo storage plugin stated:

    As of now, predicate pushdown is implemented for the following filters: >, >=, <, <=, ==, !=, isNull and isNotNull.

    Looking at the latest version of the code this remains the case:

    switch (functionName) {
        case "equal":
          compareOp = MongoCompareOp.EQUAL;
          break;
        case "not_equal":
          compareOp = MongoCompareOp.NOT_EQUAL;
          break;
        case "greater_than_or_equal_to":
          compareOp = MongoCompareOp.GREATER_OR_EQUAL;
          break;
        case "greater_than":
          compareOp = MongoCompareOp.GREATER;
          break;
        case "less_than_or_equal_to":
          compareOp = MongoCompareOp.LESS_OR_EQUAL;
          break;
        case "less_than":
          compareOp = MongoCompareOp.LESS;
          break;
        case "isnull":
        case "isNull":
        case "is null":
          compareOp = MongoCompareOp.IFNULL;
          break;
        case "isnotnull":
        case "isNotNull":
        case "is not null":
          compareOp = MongoCompareOp.IFNOTNULL;
          break;
    }
    

    FWIW, IN is not the only unsupported operator; the LIKE operator is not supported yet either though there is an open issue against the Drill Mongo storage plugin for that.

    So you coult ...

    • Implement the IN operator yourself. There's a patch attached to this issue which might provide some with guidance on how to implement the IN operator
    • Raise an issue against the Drill project specifying Component=Storage - MongoDB
    • Implement your IN as a series of ORed equals e.g. instead of WHERE xxx.USUARIO.PUBLICOALVO IN (84, 85, 90, 94, 500) you could try WHERE xxx.USUARIO.PUBLICOALVO = 84 OR xxx.USUARIO.PUBLICOALVO=85 ...