Search code examples
sqlselectgreenplum

Shift operation grouping by a certain column SQL


So I'm with GreenPlum on a big table name purchases containing more than 4 million of rows. Here is an example of this table:

userId |        purchaseTime      | timeDiff
------------------------------------------
 17    |   2016-02-01 11:01:02    |
 17    |   2016-02-01 13:24:58    |
 17    |   2016-02-01 21:12:36    |
 67    |   2016-02-01 17:04:49    |
 84    |   2016-02-01 16:13:20    |
 94    |   2016-02-01 05:46:13    |
 94    |   2016-02-01 21:33:19    |

The table was ordered by userID and purchaseTime to help understand my goal

My objective is to update this table by including the difference of time between the current row and the last purchased time for each user.

Making it look like this:

userId |        purchaseTime      | timeDiff
------------------------------------------
 17    |   2016-02-01 11:01:02    | NULL
 17    |   2016-02-01 13:24:58    | 2:23:56
 17    |   2016-02-01 21:12:36    | 8:12:38
 67    |   2016-02-01 17:04:49    | NULL
 84    |   2016-02-01 16:13:20    | NULL
 94    |   2016-02-01 05:46:13    | NULL
 94    |   2016-02-01 21:33:19    | 16:13:06

The select from one of your answer helped me. Now I need to do the UPDATE but I'm getting a syntax error near the UPDATE doing:

WITH tmp_table AS
(
    SELECT userId ,  
       purchaseTime ,
       purchaseTime - LAG(purchaseTime )
       OVER (PARTITION BY userId  ORDER BY purchaseTime) AS timeDiff
    FROM   purchases
)

UPDATE purchases SET timeDiff = tmp_table.timeDiff
FROM tmp_table
WHERE userId   = tmp_table.userId  
AND purchaseTime = tmp_table.purchaseTime;

Can any one help me update my table?


Solution

  • So basing myself on the query from @mureinik, in order to do the update you have to do the following :

    UPDATE purchases
    SET timeDiff = tmp_table.timeDiff
    FROM (SELECT userId, purchaseTime ,
           (EXTRACT(epoch FROM purchaseTime - LAG(purchaseTime) OVER 
               (PARTITION BY userId ORDER BY purchaseTime))/60)::integer AS timeDiff
            FROM   purchases) AS tmp_table
    WHERE purchases.userId = tmp_table.userId
    AND purchases.timeDiff = tmp_table.timeDiff;
    

    In the update you will have the EXTRACT and epoch FROM statements, that's in order to return the number of seconds in the interval. If you want them in minutes divide it by 60 \60 and finally if you want to round it, just cast it to integer.