Search code examples
hadoophiveavrohive-serde

unable to get avro data for array<struct<string, string>> in rows


Hi I have a avro schema with array of struct and I am able to save data as avro. but while retrieving the data from

array<struct<string, string>>

I am unable to get in rows. All the data I am getting in single row.

here is the table definition

CREATE EXTERNAL TABLE meterevents ROW FORMAT SERDE org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED as INPUTFORMAT org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION '/......' TBLPROPERTIES ('avro.schema.url'='/..../schema.avsc');

hive table structure

nametype                struct<nametypedescription:string,nametypename:string,nametypeauthority:struct<nametypeauthorityname:string,nametypeauthoritydescription:string>>       from deserializer
names                   struct<name:string,nametype:struct<nametypedescription:string,nametypename:string,nametypeauthority:struct<nametypeauthorityname:string,nametypeauthoritydescription:string>>>  from deserializer
enddeviceeventdetails   struct<enddeviceeventdetailsname:string,enddeviceeventdetailsvalue:string>      from deserializer
enddeviceevent          struct<mrid:string,createddatetime:string,issuerid:string,issuertrackingid:string,reason:string,severity:string,userid:string,asset:struct<assetmrid:string,assetnames:array<struct<name:string,nametype:struct<nametypedescription:string,nametypename:string,nametypeauthority:struct<nametypeauthorityname:string,nametypeauthoritydescription:string>>>>>,enddeviceeventdetails:array<struct<enddeviceeventdetailsname:string,enddeviceeventdetailsvalue:string>>,enddeviceeventtype:string,enddeviceeventnames:array<struct<name:string,nametype:struct<nametypedescription:string,nametypename:string,nametypeauthority:struct<nametypeauthorityname:string,nametypeauthoritydescription:string>>>>,status:struct<statusdatetime:string,statusreason:string,statusremark:string,statusvalue:string>,usagepoint:struct<usagepointmrid:string,usagepointnames:array<struct<name:string,nametype:struct<nametypedescription:string,nametypename:string,nametypeauthority:struct<nametypeauthorityname:string,nametypeauthoritydescription:string>>>>>>       from deserializer
enddeviceeventtype      struct<enddeviceeventtypemrid:string,enddeviceeventtypedomain:string,enddeviceeventtypeeventoraction:string,enddeviceeventtypesubdomain:string,type:string,enddeviceeventtypenames:array<struct<name:string,nametype:struct<nametypedescription:string,nametypename:string,nametypeauthority:struct<nametypeauthorityname:string,nametypeauthoritydescription:string>>>>>       from deserializer
header                  struct<noun:string,context:string,verb:string,value:string,source:string,timestamp:string,correlationid:string,name:string,messageid:string,property:struct<propertyname:array<string>,propertyvalue:array<string>>>    from deserializer
payload                 struct<enddeviceevents:array<struct<mrid:string,createddatetime:string,issuerid:string,issuertrackingid:string,reason:string,severity:string,userid:string,asset:struct<assetmrid:string,assetnames:array<struct<name:string,nametype:struct<nametypedescription:string,nametypename:string,nametypeauthority:struct<nametypeauthorityname:string,nametypeauthoritydescription:string>>>>>,enddeviceeventdetails:array<struct<enddeviceeventdetailsname:string,enddeviceeventdetailsvalue:string>>,enddeviceeventtype:string,enddeviceeventnames:array<struct<name:string,nametype:struct<nametypedescription:string,nametypename:string,nametypeauthority:struct<nametypeauthorityname:string,nametypeauthoritydescription:string>>>>,status:struct<statusdatetime:string,statusreason:string,statusremark:string,statusvalue:string>,usagepoint:struct<usagepointmrid:string,usagepointnames:array<struct<name:string,nametype:struct<nametypedescription:string,nametypename:string,nametypeauthority:struct<nametypeauthorityname:string,nametypeauthoritydescription:string>>>>>>>,enddeviceeventtype:array<struct<enddeviceeventtypemrid:string,enddeviceeventtypedomain:string,enddeviceeventtypeeventoraction:string,enddeviceeventtypesubdomain:string,type:string,enddeviceeventtypenames:array<struct<name:string,nametype:struct<nametypedescription:string,nametypename:string,nametypeauthority:struct<nametypeauthorityname:string,nametypeauthoritydescription:string>>>>>>>

I am using "LATERAL VIEW explode" option in my query

select eddetails.enddeviceeventdetailsname,	eddetails.enddeviceeventdetailsvalue 
FROM meterevents_tmp 
LATERAL VIEW explode(payload.enddeviceevents.enddeviceeventdetails)  ed AS eddetails 
limit 1;

but still I am getting the data in single row.

enddeviceeventdetailsname       enddeviceeventdetailsvalue
["EventSequenceNumber","EventSequenceNumber","EventSequenceNumber","EventSequenceNumber"]       ["683","684","685","686"

I would like to have this data as

enddeviceeventdetailsname       enddeviceeventdetailsvalue
EventSequenceNumber              683
EventSequenceNumber              684
EventSequenceNumber              685
EventSequenceNumber              686

I have read the other question on the stackoverflow : Exploding Array of Struct using HiveQL

But unable to get the expected output. Since in that post its hive external table and not a serde where I am unable to specify "MAP KEYS TERMINATED BY" and "COLLECTION ITEMS TERMINATED BY"

any help is much appreciated.

Thanks


Solution

  • I was able to resolve this ---

    I was not able to get the output in rows because the

    array<struct<string,string>> 
    

    was part of parent array

    array<struct<array<struct<string, string>>>
    

    I updated my query and used the nested explod

    select eddetails.enddeviceeventdetailsname, eddetails.enddeviceeventdetailsvalue from (select ede.enddeviceeventdetails FROM meterevents_tmp LATERAL VIEW explode(payload.enddeviceevents)  e AS ede) t LATERAL VIEW explode(t.enddeviceeventdetails)  ed AS eddetails limit 10;
    

    I got the desired output -

    enddeviceeventdetailsname       enddeviceeventdetailsvalue
    EventSequenceNumber     683
    EventSequenceNumber     684
    EventSequenceNumber     685
    EventSequenceNumber     686