I have written a successful script for counting total number of steps taken by pedestrians, and their highest step count. What I don't get is producing headers in Pig Output, so that output looks neat, and clean. Is there any way that can produce headers while writing output. Following is my code,
register 'piggybank-0.15.0.jar';
DEFINE CSVLoader org.apache.pig.piggybank.storage.CSVLoader();
part1 = LOAD '/home/cloudera/Pedestrian_Counts.csv' using CSVLoader(',') as (date_time, sensor_id: int, sensor_name: chararray, hourly_counts: int);
part2 = GROUP part1 BY (sensor_id, sensor_name);
part3 = FOREACH part2 GENERATE FLATTEN(group) AS (sensor_id, sensor_name), SUM(part1.hourly_counts), MAX(part1.hourly_counts);
STORE part3 into '/home/cloudera/pedestrian_result' using PigStorage('\t');
First 5 lines of my output is as follows,
1 Bourke Street Mall (North) 49591633 5573
2 Bourke Street Mall (South) 67759939 7035
3 Melbourne Central 70973929 5890
4 Town Hall (West) 90274498 8052
5 Princes Bridge 58752043 7391
Can we place headers while writing output? Thanks in advance.
Either merge all the part files data to a file in local file system which has header information in it or use hive table to store the output of this pig script.
Using Hive table for storing the output will have its own schema.
You should be using Hcat for accessing Hive in Pig.