Search code examples
sqlverticamultiple-conditions

Vertica SQL: Disregard rows, based on two conditions


I need a where condition that considers to following for an entire table:

If a 0 exists for an ID (in column d) then exclude everything that is >0, if 0 does not exist, but exists a row where d = a then exclude everything before that..

In Example (Case 1) I want to disregard rows 1 & 2, in Example 2 (Case 2) I want to disregard rows 1,2,& 3.

Currenty I have: where d <= 0 or d = a) but in Case 1 this also returns row nr 2, which I do not want.

row nr ID d a
1 1 180 78
2 1 78 78
3 1 0 78
4 1 -67 78
5 1 -121 78
row nr ID d a
1 2 180 148
2 2 171 148
3 2 170 148
4 2 148 148
5 2 -67 148
6 2 -121 148

Solution

  • This becomes a bit more complex to do than what you expected. You will have to involve a nested query with an OLAP function to detect that each row in a partition ( defined by the value of id) belongs to a partition of which at least one row has a value of 0 for d, and then, outside of that nested query, filter for that fact, and the value of d being 0 or greater. That's case 1.

    In the other case, you use the same nested query to ascertain that you use only rows with no row with a value of 0 for d in the partition, and from there, the easiest way is to use Vertica's MATCH() clause to filter out the pattern of rows that consists of : a row with a d equal to a; zero, one or more occurrences of any row following, which I describe, in the query, with the pattern: (d_equal_a anyrow*) .

    Here goes:

    WITH
    -- YOUR INPUT, don't use in query
    indata(row_nr,ID,d,a) AS (
              SELECT 1,1,180,78
    UNION ALL SELECT 2,1,78,78
    UNION ALL SELECT 3,1,0,78
    UNION ALL SELECT 4,1,-67,78
    UNION ALL SELECT 5,1,-121,78
    UNION ALL SELECT 1,2,180,148
    UNION ALL SELECT 2,2,171,148
    UNION ALL SELECT 3,2,170,148
    UNION ALL SELECT 4,2,148,148
    UNION ALL SELECT 5,2,-67,148
    UNION ALL SELECT 6,2,-121,148
    )
    -- end of your input, real query starts here, replace following comma with "WITH"
    ,
    min_abs_d_eq_0 AS (
    -- nested query with OLAP expression returning Boolean
      SELECT
        *
        , (MIN(ABS(d)) OVER (PARTITION BY id) = 0) AS min_abs_d_eq_0
      FROM indata
    )
    ,
    case1 AS (
      SELECT
        row_nr
      , id
      , d
      , a
      , 'no match clause' AS event_name -- these are based on the 
      , 0 AS pattern_id                 -- MATCH clause coming from 
      , 0 AS match_id                   -- the next CTE, "case2"
      FROM min_abs_d_eq_0
      WHERE min_abs_d_eq_0 AND d <= 0
    )
    ,
    case2 AS (
      SELECT
        row_nr
      , id
      , d
      , a
      , event_name()
      , pattern_id()
      , match_id()
      FROM min_abs_d_eq_0
      WHERE NOT min_abs_d_eq_0
      MATCH (
        PARTITION BY id ORDER BY row_nr
        DEFINE
          d_equal_a  AS d = a
        , anyrow    AS true
        PATTERN p AS (d_equal_a anyrow*)
      )
    )
    SELECT * FROM case1
    UNION ALL
    SELECT * FROM case2
    ORDER BY id,row_nr;
    -- out  row_nr | id |  d   |  a  |   event_name    | pattern_id | match_id 
    -- out --------+----+------+-----+-----------------+------------+----------
    -- out       3 |  1 |    0 |  78 | no match clause |          0 |        0
    -- out       4 |  1 |  -67 |  78 | no match clause |          0 |        0
    -- out       5 |  1 | -121 |  78 | no match clause |          0 |        0
    -- out       4 |  2 |  148 | 148 | d_equal_a       |          1 |        1
    -- out       5 |  2 |  -67 | 148 | anyrow          |          1 |        2
    -- out       6 |  2 | -121 | 148 | anyrow          |          1 |        3