In a Big Data context I have a time series S1=(t1, t2, t3 ...) sorted in an ascending order. I would like to produce a series of time differences: S2=(t2-t1, t3-t2 ...)
Is there a way to do this in Apache Pig? Short of a very inefficient self-join, I do not see one.
If not, what would be an good way to do this suitable for large amounts of data?
Edit
Sample Data
2014-02-19T01:03:37
2014-02-26T01:03:39
2014-02-28T01:03:45
2014-04-01T01:04:22
2014-05-11T01:06:02
2014-06-30T01:08:56
Script
s1 = LOAD 'test2.txt' USING PigStorage() AS (t:chararray);
s11 = foreach s1 generate ToDate(t) as t1;
s1_new = rank s11;
s2 = LOAD 'test2.txt' USING PigStorage() AS (t:chararray);
s22 = foreach s2 generate ToDate(t) as t1;
s2_new = rank s22;
-- Filter records by excluding the 1 ranked row and rank the new data
ss = FILTER s2_new by (rank_s22 > 1);
ss_new = rank ss;
s3 = join s1_new by rank_s11,ss_new by rank_ss;
s4 = foreach s3 generate DaysBetween(ss_new::t1,s1_new::t1) as time_diff;
DUMP s4;