Search code examples
sqloracle-databasehierarchical

Oracle CONNECT Operations


I've read through the Oracle documentation concerning the CONNECT operations, but I can't seem to get my head around a database query we have in an existing application. Below is a simplified version of the query.

SELECT LEVEL,
       CONNECT_BY_ROOT MY_MONTH MY_LABEL,
       b.*
FROM (
    SELECT ROWNUM AS ORDERING, 
           MY_AREA,
           TRUNC (THE_MONTH, 'MONTH') AS MY_MONTH
    FROM MY_TABLE
    ORDER BY MY_AREA, MY_MONTH DESC
) b
WHERE LEVEL <= 3
START WITH 1 = 1
CONNECT BY PRIOR MY_AREA = MY_AREA
       AND PRIOR ORDERING = ORDERING - 1
       AND PRIOR MY_MONTH <= ADD_MONTHS(MY_MONTH, 6);

While I have a basic understanding of the CONNECT functionalities, this combination has me lost. Can anyone explain what is going on in this query?

I think the end says to get all of the rows that have the same area and a row number 1 less than the current row number and a date before 6 months in the future from the current date. I would guess this would only return 1 row (due to the row number criteria) or 0 rows if the other criteria weren't met. And then maybe the first CONNECT_BY_ROOT says to get that row's MY_MONTH value?


Solution

  • Start with b, which is a table of MY_AREA (a number?), MY_MONTH, which is a month-truncated date (i.e. the days are all set to 01), and an aliased ROWNUM, which is determined by the ORDER BY clause, which is ORDER BY MY_AREA, MY_MONTH DESC, e.g.:

    +----------+---------+-----------+
    | ORDERING | MY_AREA | MY_MONTH  |
    +----------+---------+-----------+
    | 1        | 10      | 01-SEP-12 |
    | 2        | 10      | 01-JAN-12 |
    | 3        | 12      | 01-AUG-12 |
    | 4        | 12      | 01-JUN-12 |
    | 5        | 12      | 01-MAY-12 |
    | 6        | 12      | 01-JAN-12 |
    | 7        | 12      | 01-JAN-10 |
    +----------+---------+-----------+
    

    The WHERE clause doesn't come into play until later, so move on to START WITH, which says only 1 = 1. This means that every row in b will be used in the query; if you had had another condition here, e.g. my_area < 5 or whatever, only a certain set of rows would have been used.

    Now, the CONNECT BY, which determines how the hierarchy should be built. This works like a WHERE clause, except for the special PRIOR keyword which tells the DB to look at the previous level in the hierarchy. So:

    • PRIOR MY_AREA = MY_AREA just means that the child node has to have the same value for `MY_AREA'
    • PRIOR ORDERING = ORDERING - 1 means that the child should come one row after the current node in b's ordering.
    • PRIOR MY_MONTH <= ADD_MONTHS(MY_MONTH, 6) means that in order to be joined into the hierarchy, the previous MY_MONTH should be 6 months or less after the date of the current node.

    The whole hierarchy is then created. LEVEL (special for CONNECT BY...) is set to the level in the hierarchy, CONNECT_BY_ROOT gives the MY_MONTH value for the root of that hierarchy and aliases it to MY_LABEL. After this, the table would look something like the following table. I've added separators for each hierarchy for clarity.

    +-------+-----------+----------+---------+-----------+
    | LEVEL | MY_LABEL  | ORDERING | MY_AREA | MY_MONTH  |
    +-------+-----------+----------+---------+-----------+
    | 1     | 01-SEP-12 | 1        | 10      | 01-SEP-12 |
    +-------+-----------+----------+---------+-----------+
    | 1     | 01-JAN-12 | 2        | 10      | 01-JAN-12 |
    +-------+-----------+----------+---------+-----------+
    | 1     | 01-AUG-12 | 3        | 12      | 01-AUG-12 |
    | 2     | 01-AUG-12 | 4        | 12      | 01-JUN-12 |
    | 3     | 01-AUG-12 | 5        | 12      | 01-MAY-12 |
    | 4     | 01-AUG-12 | 6        | 12      | 01-JAN-12 |
    +-------+-----------+----------+---------+-----------+
    | 1     | 01-JUN-12 | 4        | 12      | 01-JUN-12 |
    | 2     | 01-JUN-12 | 5        | 12      | 01-MAY-12 |
    | 3     | 01-JUN-12 | 6        | 12      | 01-JAN-12 |
    +-------+-----------+----------+---------+-----------+
    | 1     | 01-MAY-12 | 5        | 12      | 01-MAY-12 |
    | 2     | 01-MAY-12 | 6        | 12      | 01-JAN-12 |
    +-------+-----------+----------+---------+-----------+
    | 1     | 01-JAN-12 | 6        | 12      | 01-JAN-12 |
    +-------+-----------+----------+---------+-----------+
    | 1     | 01-JAN-10 | 7        | 12      | 01-JAN-10 |
    +-------+-----------+----------+---------+-----------+
    

    So, as you can see, each of the rows appears at the top of its own hierarchy, with all nodes meeting the CONNECT BY criteria under it.

    Finally, the WHERE clause is applied; this chops off all of the levels > 3 in every hierarchy, so you're left with a maximum of 3 levels. This affects only one row in the middle hierarchy, the one with LEVEL = 4.