Search code examples
hadoopapache-pigapache-pig-grunt

Get max date from file using Pig latin


I have a text file with dates and some other columns. Date column values are of format 'yyyy-MM-dd HH:mm:ss'.

From the text file, I would like to get max date or latest date. (Note: I have seen MAX function only working when GROUP BY in Pig Latin)

Can anyone suggest if there is a way to get that using Pig Latin or any alternative?


Solution

  • Based on the sample dataset,

    Apple|$600|2009-01-14 00:00:00| 
    Apple|$650|2010-12-16 10:20:20| 
    Banana|$800|2019-12-14 00:00:00| 
    Banana|$800|2016-11-11 01:45:03|
    

    The maximum value from the date column can be retrieved using

    fruits = LOAD 'fruits.txt' using PigStorage('|');
    dt = FOREACH fruits GENERATE ToDate($2, 'yyyy-MM-dd HH:mm:ss');
    odt = ORDER dt by $0 DESC;
    max = LIMIT odt 1;
    dump max;