Search code examples
jsonhadoopapache-pighadoop2

Apache PIG, JSON Loader


This is my sample input file:

[{"disknum":36,"disksum":136.401,"disk_rate":1872.0,"disk_lnum": 13}]
[{"disknum":36,"disksum":105.2,"disk_rate":123084.8,"disk_lnum": 13}]

I'm trying to parse this JSON data using JsonLoader in PIG,

Here's is my script:

a = LOAD '/pig/tc.log' using JsonLoader ('disknum:chararray,disksum:chararray,disk_rate:chararray,disk_lnum:chararray');

b = FOREACH a GENERATE disknum,disksum,disk_rate,disk_lnum;

DUMP b;

Expected output:

36,136.401,1872.0,13

36,105.2,123084.8,13

Actual Output:

( )

Please help me! what am I missing?


Solution

  • Notice the [] surrounding the objects in your file.You can either load it to a map and access the fields or use ElephantBird jsonloader.

    a = LOAD '/pig/tc.log' using JsonLoader(json:map[]);
    b = FOREACH a GENERATE flatten(json#'disknum') AS disknum,
                           flatten(json#'disksum') AS disksum,
                           flatten(json#'disk_rate') AS disk_rate,
                           flatten(json#'disk_lnum') AS disk_lnum;      
    DUMP b;