Search code examples
hadoopmapreducehiveapache-pig

Hadoop interview query-Mapreduce-Pig-Hive


This was a question asked for me in my hadoop interview . I have the table data like below.

I have taken a new bike and on the 1st day the distance I have travelled 20 km 2nd day the reading on the meter was 50(day 1 + day 2) 3rd day the reading on the meter was 60(day 1+ day 2+ day 3)

Day Distance
1    20
2    50
3    60

Now the question is , I want the output to be like below

Day  Distance
1    20
2    30
3    10

i.e I want the distance travelled only on the 1st day, 2nd day and 3rd day.

Answer can be in Hive/Pig/MapReduce.

Thanks


Solution

  • This is a running totals like problem, you can resolve it by this Hive query

    with b as (
    select 0 as d, 0 as dst
    union all 
    select d, dst from mytable
    )
    SELECT a.d, a.km-b.km new_dst from mytable a, b 
    where a.d-b.d==1