My data is something like this:
(201601030637,2,64.001213)
(201601030756,3,63.5869656667)
(201601040220,2,62.758471)
which the first column is year (2016) month (01) day (03) hour (06) and minutes (37) connected to each other.
I want to sum the values of third column based on the week. How can I group them to have 52 different groups for entire year? Can anyone help? Thanks!
Use ToDate
to convert your datestring to datetime type. Then use GetWeek
to get weeknumber. Finally use GROUP
to group by weeknum and SUM
.
A = LOAD '/path_to_data/data' USING PigStorage(',') as (c1: chararray, c2: int, c3: float);
B = FOREACH A GENERATE GetWeek(ToDate(c1,'yyyyMMddHHmm')) as weeknum, c1, c2, c3;
C = FOREACH (GROUP B BY weeknum) GENERATE group as weeknum, SUM(B.c2) as c2_sum;
DUMP C;