Search code examples
jsonhadoopapache-piguser-defined-functionselephantbird

Load JSON data enclosed in square brackets using Elephant Bird JsonLoader in Apache PIG


Using Elephantbird JsonLoader I'm able to load the data if the record is in this format:

{"disknum":36,"disksum":136.401,"disk_rate":1872.0,"disk_lnum": 13}

but the actual data is in the below format: (enclosed in square brackets)

[{"disknum":36,"disksum":136.401,"disk_rate":1872.0,"disk_lnum": 13}]

When I try to parse this it doesn't throw error nor it gives any useful output. It shows success! and 0 records read and 0 records written.

Please advice how to handle the data with square parenthesis.

below is my syntax for non square bracketed records:

register '/home/data/Desktop/elephantbird/elephant-bird-core-4.1.jar';
register '/home/gopal/Desktop/elephantbird/elephant-bird-hadoop-compat-4.1.jar';
register '/home/gopal/Desktop/elephantbird/elephant-bird-pig-4.1.jar';
register '/home/gopal/Desktop/elephantbird/json-simple-1.1.jar';
a = LOAD '/pig/tc1.log' USING com.twitter.elephantbird.pig.load.JsonLoader() as (json:map[]);
b = FOREACH a GENERATE flatten(json#'node_disk_lnum_1') AS node_disk_lnum_1, flatten(json#'node_disk_xfers_in_rate_sum') AS node_disk_xfers_in_rate_sum, flatten(json#'node_disk_bytes_in_rate_22') AS node_disk_bytes_in_rate_22,  flatten(json#'node_disk_lnum_7') AS node_disk_lnum_7;
dump b;

Please advice! Thanks in advance :)


Solution

  • I think this might help : see solution , its pretty close. Json parse with elephantbird in Pig

    You need to provide a rootname to your json.