Search code examples
hivemahout

Hive MAP isn't reading input correctly


I am trying create a table on this mahout recommender system output data on s3.

703209355938578 [18519:1.5216354,18468:1.5127649,17962:1.5094717,18317:1.5075916]
828667482548563 [18070:1.0,18641:1.0,18632:1.0,18770:1.0,17814:1.0,18095:1.0]
1705358040772485 [18783:1.0,17944:1.0,18632:1.0,18770:1.0,18914:1.0,18386:1.0]

with this schema,

CREATE external table user_ad_reco (
userid bigint,
reco MAP<bigint , double>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
COLLECTION ITEMS TERMINATED BY ','
MAP KEYS TERMINATED BY ':'
LOCATION
's3://xxxxx/data/RS/output/m05/';

but while I am reading data back with hive,

hive >

select * from user_ad_reco limit 10;

It is giving output like this

703209355938578 {18519:1.5216354,18468:1.5127649,17962:null}
828667482548563 {18070:1.0,18641:1.0,18632:1.0,18770:1.0,17814:null}
1705358040772485 {18783:1.0,17944:1.0,18632:1.0,18770:1.0,18914:null}

So, last key:value of map input is missing in output with null in last output pair :(.

Can anyone help regarding this?


Solution

  • Reason for nulls :

    • input data format with brackets gives null, cause of brackets the row format in not being properly read , the last map entry 1.5075916 is being read as 1.5075916], so it's giving null due to data type mismatch.

    703209355938578 [ 18519:1.5216354,18468:1.5127649,17962:1.5094717,18317:1.5075916 ]

    • input data format without brackets works clean : (tested)

    703209355938578 18519:1.5216354,18468:1.5127649,17962:1.5094717,18317:1.5075916