Search code examples
hadoopapache-pigresamplingbigdata

Pig : how to resample time series data?


I have a very large dataset that I am processing with Pig.

The data contains a timestamp (up to the second frequency), and I would like to aggregate my data at the minute frequency (counting how many observations per minnute, averaging other variables over that minute).

Is it possible to do that using Pig? Thanks!


Solution

  • You can modify you timestamp field (generate new field like YYYYmmddHHMMss to YYYYmmddHHMM), then group by timestamps and aggregate your data.