I created a Hive(3.1.2) table from a BSON file dump from MongoDB (4.0).After creating the table, I select couples of entries from the table. However some of them value is null.
I tried to print the table row from BSON using python. It printed the values correct. Means the value not missing. Any clue about how to further trouble shoot?
SQL to create hive table.
CREATE EXTERNAL TABLE `tmp_test_status`(
`id` string COMMENT 'frame_id',
`createdAt` INT,
`updatedAt` string,
`task` string)
row format serde 'com.mongodb.hadoop.hive.BSONSerDe'
with serdeproperties('mongo.columns.mapping'='{"id":"_id"}')
stored as inputformat 'com.mongodb.hadoop.mapred.BSONFileInputFormat'
outputformat 'com.mongodb.hadoop.hive.output.HiveBSONFileOutputFormat'
LOCATION
'oss://data-warehouse/hive/warehouse/data.db/tmp_test_status';
===========================================
Data I printed by python bson lib.
{'_id': '00003a02-280d-4e59-8483-a0143e0a3359', 'createdAt': '1557999191951', 'updatedAt': '1557999191951', 'task': 'lane', '__v': 0}
===========================================
Data I selected from Hive table:
00003a02-280d-4e59-8483-a0143e0a3359 NULL NULL lane
093e72ae-206b-4112-ac28-5ba38f9485d0 NULL NULL lane
093ebe41-183c-47b4-ab25-93336875ae10 NULL NULL lane
093ec16b-ba1d-4ddc-90bc-9981342e8071 NULL NULL lane
I found the answer my self, the reason is that the BSON file attribute name distinguish lower and upper case, but Hive not. If the attribute name contain upper case in BSON file, then Hive will return NULL when query.Simply map the attribute name by table properties worked for me.
with serdeproperties('mongo.columns.mapping'='{"id":"_id", "createdAt": "createdAt", "updatedAt": "updatedAt", "reLabeled1" : "reLabeled1", "isValid": "isValid"}')