Search code examples
apacheapache-drillmapr

Apache Drill - Getting error on KVGEN method


I have a huge dataset where each record have json data similar to below -

{"project":{"id":"2625","createDate":1542597000000,"rank":0,"highlight":false,"isDisplay":true,"isNewProject":true,"propertyId":2231,"districts":{"id":41,"name":"abc","region":"123"}}}

When I am trying to genrate key value pairs using select kvgen(t.project) from dfs.filePath t in apache drill, I am getting below error -

DrillRuntimeException: Mappify/kvgen does not support heterogeneous value types. All values in the input map must be of the same type. The field [createDate] has a differing type [minor_type: BIGINT mode: OPTIONAL ]

It looks like drill expects all values to be of same type. But how to do that? Is there any function available in drill?

My drill version is 1.9.0


Solution

  • I figured it out. KVGEN method doesn't work if json is nested. To make it work, there are two approaches which can be followed -

    • Take out the nested json outside

    {"project":{"id":"2625","createDate":1542597000000,"rank":0,"highlight":false,"isDisplay":true,"isNewProject":true,"propertyId":2231},"districts":{"id":41,"name":"abc","region":"123"}}

    and then apply KVGEN method as select kvgen(t.project) from dfs.filePath t

    • Apply kvgen method on inner json first and then use nested query as below

      select tbl2.col1.id, tbl2.col2.value from (select tbl1.project as col1, flatten(kvgen(tbl1.project.districts)) col2 from dfs.filePath tbl1) tbl2

    And as rightly mentioned by @arina-yelchiyeva, session option store.json.all_text_mode needs to be set to true.