Search code examples
amazon-redshiftvertica

Is there any alternative to Vertica's conditional_true_event in RedShift?


Vertica has a very nice type of operations: Event-Based Window operations, which basically let you identify when an event occurs. For example the conditional_true_event will increment a counter each time the given boolean expression resolves to true. We use this kind of approach heavily.

We are thinking about moving to RedShift, but we would need a similar function. RedShift has some nice window functions, but I can't find this one.

Is there any way I can emulate this function using RedShift?


Solution

  • The CONDITIONAL_TRUE_EVENT() is rather easy to write with window functions. It's just a COUNT with a conditional (CASE):

    SELECT ts, symbol, bid,
           CONDITIONAL_TRUE_EVENT(bid > 10.6)  
               OVER (ORDER BY ts) AS oce
    FROM Tickstore3 
    ORDER BY ts ;
    

    becomes:

    SELECT ts, symbol, bid,
           COUNT(CASE WHEN bid > 10.6 THEN 1 END)  
               OVER (ORDER BY ts
                     ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) 
              AS oce
    FROM Tickstore3 
    ORDER BY ts ;
    

    The CONDITIONAL_CHANGE_EVENT() is more complicated because it needs to use the previous value. It can be emulated using LAG() and SUM() or COUNT() (or ROW_NUMBER()). But it will require I think a CTE or a derived table (or a self-join):

    SELECT ts, symbol, bid,
           CONDITIONAL_CHANGE_EVENT(bid)  
               OVER (ORDER BY ts) AS cce
    FROM Tickstore3 
    ORDER BY ts ;
    

    will become:

    WITH emu AS
      ( SELECT ts, symbol, bid,
               CASE WHEN bid <> LAG(bid) OVER (ORDER BY ts) 
                   THEN 1 
               END AS change_bid
        FROM Tickstore3
      ) 
    SELECT ts, symbol, bid,
           COUNT(change_bid) 
               OVER (ORDER BY ts
                     ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) 
             AS cce
    FROM emu 
    ORDER BY ts ;
    

    I don't know how this CONDITIONAL_CHANGE_EVENT() function behaves with nulls. If there are NULL values in the checked for changes column - and you want to see if there is a change from the last non-null value and not just the previous one - the rewrite will be even more complicated.


    Edit: As far as I understand Redshift's documentation an explicit window frame (ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) is required for window aggregates when there is an ORDER BY. So, you can/have to use that (or whatever the default frame is in Vertica for these cases. It's either the above or with RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW).