Search code examples
hadoopapache-pig

Headers in Pig Output


I have written a successful script for counting total number of steps taken by pedestrians, and their highest step count. What I don't get is producing headers in Pig Output, so that output looks neat, and clean. Is there any way that can produce headers while writing output. Following is my code,

register 'piggybank-0.15.0.jar';
DEFINE CSVLoader org.apache.pig.piggybank.storage.CSVLoader();
part1 = LOAD '/home/cloudera/Pedestrian_Counts.csv' using CSVLoader(',') as (date_time, sensor_id: int, sensor_name: chararray, hourly_counts: int);
part2 = GROUP part1 BY (sensor_id, sensor_name);
part3 = FOREACH part2 GENERATE FLATTEN(group) AS (sensor_id, sensor_name), SUM(part1.hourly_counts), MAX(part1.hourly_counts);
STORE part3 into '/home/cloudera/pedestrian_result' using PigStorage('\t');

First 5 lines of my output is as follows,

1   Bourke Street Mall (North)  49591633    5573
2   Bourke Street Mall (South)  67759939    7035
3   Melbourne Central   70973929    5890
4   Town Hall (West)    90274498    8052
5   Princes Bridge  58752043    7391

Can we place headers while writing output? Thanks in advance.


Solution

  • Either merge all the part files data to a file in local file system which has header information in it or use hive table to store the output of this pig script.

    Using Hive table for storing the output will have its own schema.

    You should be using Hcat for accessing Hive in Pig.