Search code examples
hadoopapache-pig

Get week using date data and do some calculation in pig


My data is something like this:

(201601030637,2,64.001213)    
(201601030756,3,63.5869656667)   
(201601040220,2,62.758471)

which the first column is year (2016) month (01) day (03) hour (06) and minutes (37) connected to each other.

I want to sum the values of third column based on the week. How can I group them to have 52 different groups for entire year? Can anyone help? Thanks!


Solution

  • Use ToDate to convert your datestring to datetime type. Then use GetWeek to get weeknumber. Finally use GROUP to group by weeknum and SUM.

    A = LOAD '/path_to_data/data' USING PigStorage(',') as (c1: chararray, c2: int, c3: float);
    B = FOREACH A GENERATE GetWeek(ToDate(c1,'yyyyMMddHHmm')) as weeknum, c1, c2, c3;
    C = FOREACH (GROUP B BY weeknum) GENERATE group as weeknum, SUM(B.c2) as c2_sum;
    
    DUMP C;